M208 


Pure mathematics 


Book C 
Linear algebra 


The Open 
University 


This publication forms part of an Open University module. Details of this and other Open University modules 
can be obtained from Student Recruitment, The Open University, PO Box 197, Milton Keynes MK7 6BJ, 
United Kingdom (tel. +44 (0)300 303 5303; email general-enquiries@open.ac.uk). 


Alternatively, you may visit the Open University website at www.open.ac.uk where you can learn more about 
the wide range of modules and packs offered at all levels by The Open University. 


The Open University, Walton Hall, Milton Keynes, MK7 6AA. 
First published 2018. 
Copyright © 2018 The Open University 


All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, transmitted or 
utilised in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without 
written permission from the publisher or a licence from the Copyright Licensing Agency Ltd. Details of such 
licences (for reprographic reproduction) may be obtained from the Copyright Licensing Agency Ltd, Barnard’s 
Inn, 86 Fetter Lane, London EC4A 1EN (website www.cla.co.uk). 


Open University materials may also be made available in electronic formats for use by students of the 
University. All rights, including copyright and related rights and database rights, in electronic materials and 
their contents are owned by or licensed to The Open University, or otherwise used by The Open University as 
permitted by applicable law. 


In using electronic materials and their contents you agree that your use will be solely for the purposes of 
following an Open University course of study or otherwise as licensed by The Open University or its assigns. 


Except as permitted above you undertake not to copy, store in any medium (including electronic storage or use 
in a website), distribute, transmit or retransmit, broadcast, modify or show in public such electronic materials in 
whole or in part without the prior written consent of The Open University or in accordance with the Copyright, 
Designs and Patents Act 1988. 


Edited, designed and typeset by The Open University, using PTẸEX. 
Printed in the United Kingdom by Hobbs the Printers Limited, Brunel Road, Totton, Hampshire, S040 3WX. 


ISBN 978 1 4730 2346 8 
2.1 


Contents 


Unit C1 Linear equations and matrices 


Introduction to Book C 


Introduction 


1 


Systems of linear equations 


1.1 Systems in two and three unknowns 

1.2 Systems in n unknowns 

1.3 Solving systems 

1.4 Applications 
2 Row-reduction 

2.1 Augmented matrices 

2.2 Elementary row operations 

2.3 Solving linear equations systematically 
3 Matrix operations 

3.1 Matrix arithmetic 

3.2 Matrix multiplication 

3.3 Transposition of matrices 

3.4 Matrix form of a system of linear equations 
4 Matrix inverses 

4.1 Matrix inverses 

4.2 Invertibility Theorem 

4.3 Invertibility and systems of linear equations 

4.4 Elementary matrices 

4.5 Proof of the Invertibility Theorem 
5 Determinants 

5.1 Systems of linear equations and 

determinants 

5.2 Evaluating determinants 

5.3 Properties of determinants 

5.4 Determinants and inverses of matrices 
Summary 


Learning outcomes 


Solutions to exercises 


Unit C2 Vector spaces 
Introduction 


1 Vector spaces 
1.1 Euclidean spaces 
1.2 Real vector spaces 


2 Linear combinations and spanning sets 
2.1 Linear combinations 
2.2 Spanning sets 


3 Bases and dimension 
3.1 Linear independence and dependence 
3.2 Bases 
3.3 Standard bases 
3.4 Dimension 


4 Subspaces 
4.1 Definition 
4.2 Bases and dimension 


5 Orthogonal bases 
5.1 Orthogonal bases in R? 
5.2 Orthogonal bases in R” 
5.3 Constructing orthogonal bases 
5.4 Orthonormal bases 
5.5 Other vector spaces 


Summary 
Learning outcomes 


Solutions to exercises 


103 


105 


105 
105 
108 


121 
121 
128 


136 
136 
144 
150 
154 


159 
159 
165 


168 
168 
172 
176 
179 
181 


182 


183 


184 


Unit C3 Linear transformations 


Introduction 


1 


Introducing linear transformations 
1.1 What is a linear transformation? 
1.2 Examples of linear transformations 
1.3 Linear combinations of vectors 


2 Matrices of linear transformations 
2.1 Finding matrix representations 
2.2 An equivalent definition 

3 Composition and invertibility 
3.1 Composition Rule 
3.2 Invertible linear transformations 
3.3 Isomorphisms 

4 Image and kernel 
4.1 Image of a linear transformation 
4.2 Kernel of a linear transformation 
4.3 Dimension Theorem 

Summary 


Learning outcomes 


Solutions to exercises 


197 


199 


199 
199 
209 
214 


216 
216 
230 


231 
231 
238 
246 


248 
248 
253 
258 


266 


267 


268 


Unit C4 Eigenvectors 


Introduction 


1 


Eigenvalues and eigenvectors 

1.1 What is an eigenvector? 

1.2 Finding eigenvalues and eigenvectors 
1.3 Eigenspaces 


2 Diagonalising matrices 
2.1 Eigenvector bases 
2.2 Transition matrices 
2.3 Diagonalisation 

3 Symmetric matrices 
3.1 Diagonalising symmetric matrices 
3.2 Orthogonal matrices 

4 Conics and quadrics 
4.1 Classifying conics 
4.2 Classifying quadrics 

Summary 


Learning outcomes 


Solutions to exercises 


Acknowledgements 


Index 


281 


283 


283 
283 
290 
299 


304 
304 
308 
314 


322 
322 
330 


334 
334 
347 


353 


354 


355 


370 


371 


Unit C1 
Linear equations and matrices 


Introduction to Book C 


Systems of linear equations in several variables arise in areas as diverse as 
science, technology and economics. The solutions to such systems can 
provide answers to a wide range of problems, from supply and demand 
dependencies in economics to working out currents in electrical networks. 
Apart from such practical applications, solving systems of linear equations 
is also an interesting mathematical problem in itself. Much effort has been 
devoted to solving systems of linear equations. You will see that this 
process is not always straightforward, especially if the number of variables 
is large. 


One key issue is whether a given system of linear equations has any 
solutions at all. As a specific example of the types of situation that may 
occur, consider the following three rather similar pairs of linear equations 
in the variables x and y. 


r+ 3y=5 -z + 3y=5 -x + 3y =1 
=27 + 6y = 2 —22 + 6y = 2 —2x2 + 6y = 2 


The first pair of equations has the unique solution x = 2, y = 1, whereas 
the second pair has no solutions, and the third pair has infinitely many 
solutions (for example, x = —1, y = 0, and z = 0, y= 2). You will see that 
these different outcomes may be understood: 


e algebraically, by studying the matrix of coefficients of the equations, and 
introducing a function of these coefficients, called the determinant 


e geometrically, by interpreting solutions of the equations as points of 
intersection of the corresponding pairs of straight lines drawn in an 
(x, y)-plane. 


The algebraic approach as well as the geometric approach can be 
generalised for systems of linear equations that involve more than two 
variables. The geometric approach will require us to use a generalisation of 
the plane called n-dimensional Euclidean space, whose elements are of the 
form (£1, £2,..., En), where 21, £2,..., £n are real numbers. Depending on 
the context, we will interpret these elements as either points or vectors in 
Euclidean space. Although it is only easy to visualise objects in 
n-dimensional space when n = 1, 2 or 3, this more general Euclidean space 
is a convenient environment in which to develop the theory needed to 
analyse the solutions of systems of linear equations. 


You will see that a key tool in this theory is the concept of a linear 
transformation which, in its basic form, is a function from one Euclidean 
space to another that preserves certain aspects of the geometric structure 
of the Euclidean space. For example, the function 


t(x, y) = (x + 3y, —2x + 6y) 


is a linear transformation from R? to R?, which is closely related to the 
first pair of equations above. Indeed, solving that pair of equations is 
equivalent to finding a point (x,y) in R? such that the function t maps 


Introduction to Book C 


Unit C1 Linear equations and matrices 


(x,y) to the point (5,2). This suggests that we can obtain information 
about the solutions of systems of linear equations by studying the 
corresponding linear transformations. 


But linear transformations arise in situations apart from that of solving 
equations. For example, they are needed to manipulate computer graphic 
images, as illustrated in Figure 1. 


SOCIN 808M MMS 


Figure 1 The effect of a rotation, reflection and shear on an image 


Finally, the range of available linear transformations can be increased 
greatly by introducing the notion of a vector space. This is a generalisation 
of n-dimensional Euclidean space, and it may be finite-dimensional or 
infinite-dimensional. The elements of a vector space are sometimes called 
vectors, but they can be very general objects; for example, you will look at 
vector spaces whose elements are real functions, and linear transformations 
between such vector spaces that arise from operations on real functions 
such as differentiation and integration. In this way, vector spaces and their 
associated linear transformations form a very general context in which 
many seemingly unrelated problems can be studied using similar 
techniques. 


In this book on linear algebra you will learn about all these concepts: 
solving systems of linear equations, matrices, vector spaces and linear 
transformations. You will also use this theory to classify conics and 
quadrics. 


Introduction 


In this first unit of linear algebra you will begin by considering systems of 
linear equations in two and three unknowns. You will then see how 
matrices can be used as a concise way of representing systems of linear 
equations, before going on to study matrices themselves. You will see how 
properties of the matrix of coefficients may be used to quickly determine 
whether the system of linear equations has a unique solution. 


Many of the ideas and methods you will meet in this unit will also be used 
in the subsequent three units on linear algebra. 


1 Systems of linear equations 


In this section you will revise systems of linear equations in two and three 
unknowns and see how these ideas extend to systems in more unknowns. 


Recall that a system of linear equations in two (or three) unknowns is 
a collection of linear equations each written in terms of a set of two (or 
three) unknowns. A solution to a system of linear equations is an 
assignment of values to the unknowns that makes all the equations hold 
simultaneously; therefore such a system is also called a system of 
simultaneous linear equations in the given set of unknowns. 


1.1 Systems in two and three unknowns 


Systems in two unknowns: one equation 


In Unit Al Sets, functions and vectors, you saw that an equation of the 
form 


ax+by=c 


where a, b and c are real numbers, and a and b are not both zero, 
represents a line in R?. There are infinitely many solutions to this 
equation — one corresponding to each point on the line. 


Systems in two unknowns: two equations 


The solutions to the following system of two linear equations 


ax + by=c 
dz + ey = f 


in the two unknowns x and y, where a,b,..., f are real numbers, 
correspond to the points of intersection of these two lines in R?. 


Now, two arbitrary lines in R? may intersect at a unique point, be parallel, 
or coincide, which means that solving a system of two linear equations in 
two unknowns yields exactly one of the following three situations. 


e There is a unique solution, when the two lines represented by the 
equations intersect at a unique point, as illustrated in Figure 2. 


For example, the system 
z — y= -l1 
2r+y=4 
has the unique solution « = 1, y = 2, corresponding to the unique point 
of intersection (1,2) of the two lines in R?. 


1 Systems of linear equations 


Figure 2 Two lines 
intersecting at a unique point 


Unit C1 Linear equations and matrices 


ae 


Figure 3 Two parallel lines 
with no point of intersection 


ae 


Figure 4 Two coincident lines 


Figure 5 Two parallel planes 


e There is no solution, when the two lines represented by the equations are 
parallel, as illustrated in Figure 3. 


For example, the system 

z-y=-l 

L-y= 
represents two parallel lines in R? that do not intersect, and so the 
system has no solution. 


e There are infinitely many solutions, when the two lines represented by 
the equations coincide, as illustrated in Figure 4. 


For example, the system 
—6xz + 3y = -6 
22 — y=2 


has infinitely many solutions, as the two equations represent the same 
line in R?: the equations are a multiple of one another. In a sense, the 
two lines intersect at all of their points; that is, each pair of values for x 
and y satisfying 2x — y = 2 is a solution to this system. 


Systems in three unknowns: one equation 
In Unit Al you saw that an equation of the form 
ax +by+cz=d 


where a, b, c and d are real numbers, and a, b and c are not all zero, 
represents a plane in RÌ. There are infinitely many solutions to this 
equation — one corresponding to each point in the plane. 


Systems in three unknowns: two equations 
The solutions to the system of two linear equations 


ax + by+cz=d 
ex + fy+qz=h 


in the three unknowns x, y and z, where a,b,..., are real numbers, 
correspond to the points of intersection of these two planes in R?. 


Two arbitrary planes in R? may intersect, be parallel or coincide. In 
general, when two distinct planes in R® intersect, the set of common points 
is a line that lies in both planes. This means that solving a system of two 
linear equations in three unknowns yields exactly one of the following two 
situations. 


e There is no solution, when the two planes represented by the equations 
are parallel, as illustrated in Figure 5. 


For example, the system 


ertytz=1 
erty+tz2=2 


represents two parallel planes in R? and so has no solutions. 


e There are infinitely many solutions, when the two planes represented by 
the equations coincide, or when they intersect in a line, as illustrated in 
Figures 6 and 7, respectively. 


For example, 


t+ y+ z=1 
2x + 2y+2z=2 


has infinitely many solutions, as the two equations represent the same 
plane in R. Each set of values for x, y and z satisfying x + y + z = 1 is 
a solution to this system, such as x = 1, y = 0, z =O and z = —2, y = 4, 
z=-l. 


Similarly, the system 


r+y+z=1 
r+ Yy = 


has infinitely many solutions: the planes in R? represented by the two 
equations intersect in a line. The z-coordinate of each point on this line 
is zero, and so the line lies in the (x, y)-plane. Each set of values for x, y 
and z satisfying xz + y = 1 and z = 0 is a solution to this system, such as 
r=l1,y=0,z=Oandr=5,y=—4,z=0. 


Systems in three unknowns: three equations 


In a similar way, the solutions to the system of three linear equations 


ax + by+ cz =d 
ex + fytgz=h 
ix + jytkz=l 


in the three unknowns z, y and z, where a,b,...,/ are real numbers, 
correspond to the points of intersection of these three planes in R3. 


Three arbitrary planes in R? may meet each other in a number of different 
ways. We illustrate these possibilities below. A system of three linear 
equations in three unknowns yields exactly one of the following three 
situations. 


e There is a unique solution, when the three planes represented by the 
equations intersect at a unique point, as illustrated in Figure 8. 


For example, the system 


rty+tz=1 
cr+y =1 
x =g = 


has the unique solution æ = 0, y = 1, z = 0, corresponding to the unique 
point of intersection (0, 1,0) of the three planes in R°. 


1 Systems of linear equations 


Figure 6 Two coincident 
planes 


Figure 7 Two planes 
intersecting in a line 


Figure 8 Three planes 
intersecting in a unique point 


Unit C1 Linear equations and matrices 


e There is no solution, when two (or three) of the planes represented by 
the equations are parallel, or when the three planes form a triangular 
prism, as illustrated in Figures 9 and 10, respectively. 


For example, the system 


rtyt+tz=1 
Figure 9 Three planes, two of gE+ytz=2 
which are parallel rty-z=0 


represents three planes in RÌ, the first two of which are parallel, and so 


the system has no solutions. 
Similarly, the system 
c+y =1 
i +z=1 
—ytz=1 


Figure 10 ‘Three planes 


; co. f ; has no solutions: the planes in R? represented by the three equations 
intersecting in pairs forming a 


intersect in pairs, forming a triangular prism, and so there are no points 
common to all three planes. 


e There are infinitely many solutions, when the three planes that the 
equations represent intersect either in a plane or in a line, as illustrated 


prism 
in Figures 11 and 12, respectively. 
For example, the system 
r+ yt Z= 


x y z=-—1 
2r + 2y +22 =2 


Figure 11 Three coincident 


has infinitely many solutions, as the three equations all represent the 


same plane in R3: the equations are multiples of one another. Each set 
of values for x, y and z satisfying x + y + z = 1 is a solution to this 
system, such as x= 1, y = 0, z = 0 and z = —1, y = 3, z=-1. 
Similarly, the system 
ertytz=1 
=] 


Figure 12 Three planes Z+y-Zz= 
intersecting in a line 


has infinitely many solutions: the planes in R? represented by the three 

equations intersect in a line. The z-coordinate of each point on this line 

is zero, and so the line lies in the (x, y)-plane. Each set of values for x, y 
and z satisfying z + y = 1 and z = 0 is a solution to this system, such as 
z=l,y=0,z=0andr=—5, y=6, z=0. 


© 


1.2 Systems in n unknowns 


The equations for a line in R? and a plane in R are linear equations in 
two and three unknowns, respectively. Similarly, an equation of the form 


ax + by + cz + dw = e 


is a linear equation in the four unknowns 2, y, z and w, where a,...,e are 
real numbers, and a, b, c and d are not all zero. In general, we can define a 
linear equation in any number of unknowns. 


Definitions 


An equation of the form 
a1 21 + Goto +--+ + anTn = b, 


where a1, @2,...,@n,6 are real numbers, and aj,...,@p are not all 
zero, is a linear equation in the n unknowns z1, £2, ..., £n. The 
numbers a; are the coefficients, and b is the constant term. 


A linear equation has no terms that are products of unknowns, such as z? 


Or Y1X4. 


Exercise C1 


Which of the following are linear equations in the five unknowns 
Tl; SPET 
(a) £1 + 3x9 — £3 — 5x4 — 245 = 0 (b) £1 — T2 + 24%3%44+ 345 = 4 


(c) 5x2 — z5 = 2 (d) azı + azr? +--+ asz? =b 


We write a system of linear equations, or more precisely a general system 
ofm linear equations in n unknowns, as 


11%, + ayo%g +--+ + Aintn = b1 
G21%1 + a22£2 +++ + Amnn = b2 


Ami 21 + Am2%2 + +++ + Amntn = bm. 


The numbers b; are the constant terms, the variables x; are the unknowns 
and the numbers a;j are the coefficients. We use the double subscript ij to 
show that a;; is the coefficient of the jth unknown in the ith equation. 
The number m of equations need not be the same as the number n of 
unknowns. 


A solution of a system of linear equations is a list of values for the 
unknowns that simultaneously satisfy each of the equations. In solving a 
system, we look for all the solutions — you have already seen that some 
systems have infinitely many solutions. 


1 Systems of linear equations 


Unit C1 Linear equations and matrices 


Definitions 
The values 71 = cj, £2 = C2, ..., Ln = Cn are a solution of a system 
of m linear equations in n unknowns, denoted by 71,...,2n, if these 


values simultaneously satisfy all m equations of the system. The 
solution set of the system is the set of all the solutions. 


For example, you saw earlier that the system 


eHpy thet 
z+y = 1 (1) 
T = 7 =, 


has the unique solution x = 0, y = 1, z = 0 corresponding to the unique 
point of intersection (0,1,0) of the three planes represented by these 
equations. We can write the solution set of this system as the set 
{(0,1,0)}, which has just one member. 


You also saw that the system 


r+y+z=1 
r+y+z=2 (2) 
et+y—z=0, 


has no solutions, so its solution set is the empty set. 


Definitions 


A system of linear equations is consistent when it has at least one 
solution, and inconsistent when it has no solutions. 


The system (1) is consistent, and the system (2) is inconsistent. 


When a system of linear equations has infinitely many solutions, we can 
write down a general solution from which all solutions can be found as 
follows. 


You saw earlier that the solutions of the system 


etytpe=1 
r +y = 
e+ty-z=1 


are the sets of values for x, y and z satisfying « + y = 1 and z = 0. The 
unknowns x and y are related by the equation « + y = 1, which we can 
rewrite as y = 1 — x. Thus for each real parameter k assigned to the 
unknown x, we have a corresponding value 1 — k for the unknown y. We 
write this general solution as 


c=k, y=1-k, z=0, wherekeR. 


10 


To highlight the connection between the solutions of the system and the 
intersection of the planes in RÌ, we can write the solution set as a set of 
points in R3: 

{(k,1—k,0) € R® :k ER}, 
which we usually abbreviate to 

{(k,1—k,0):k eR}. 


Note that the order of the unknowns x, y, then z matters: the triples 
(1,0,0) and (0,1,0) correspond to different solutions. We could have 
assigned parameters differently and obtained alternative ways of writing 
down the solution set. For example, if we assign the real parameter p to 
the unknown y and rewrite the equation x+y = 1 as x = 1 — y, we get 


{(1 — p, p, 0) :PE R}. 


Homogeneous systems 


In the following systems of linear equations, the constant terms are all zero: 


2x + 3y = 0 

g= y=), (3) 
z-y—z=0 

22 +y—z=0 (4) 
-trt +y+z=0. 


Such systems are called homogeneous. 


Definitions 


A homogeneous system of linear equations is a system of linear 
equations in which each constant term is zero. 


A system containing at least one non-zero constant term is a 
non-homogeneous system. 


If we substitute z = 0, y = 0 into system (3), and x = 0, y = 0, z = 0 into 
system (4), then all the equations are satisfied. These solutions are called 
trivial. 


Definitions 


The trivial solution to a system of homogeneous linear equations is 
the solution in which each unknown is equal to zero. 


A solution with at least one non-zero unknown is a non-trivial 
solution. 


A homogeneous system always has at least the trivial solution, and is 
therefore always consistent, whereas non-homogeneous systems have only 
non-trivial solutions or may be inconsistent. 


1 Systems of linear equations 


11 


Unit C1 Linear equations and matrices 


12 


Exercise C2 


Write down a general homogeneous system of m linear equations in n 
unknowns, and show that the solution set contains the trivial solution. 


Returning to system (4), we see that there are other solutions, unlike 
system (3) which has no non-trivial solutions. For example, z = 2, y = —1, 
z = 3 is a solution to system (4). In fact, this system has an infinite 
solution set because the first and third equations are multiples of one 
another. Geometrically, the three planes represented by these equations 
intersect in a line. Figure 7 illustrates this situation, as the planes 
represented by the first and third equations coincide. The solution set can 
be written as {(2k, —k, 3k) : k € R}. 


Number of solutions 


In Subsection 1.1 you saw that when m < n < 3, a system of m equations 
in n unknowns has a solution set which either 


e contains exactly one solution, 
e is empty, or 
e contains infinitely many solutions. 


(When m = n = 1 we have one equation of the form ag = b, which has a 
unique solution.) 


In fact, as you will see in Unit C3 Linear transformations, the solution set 
of a system of m linear equations in n unknowns has one of these forms, 
for any natural numbers m and n. 


We observed earlier that two non-parallel planes in R? intersect either in a 
line or in a plane, so cannot intersect at a unique point. A consistent 
system of two linear equations in three unknowns therefore has an infinite 
solution set. In general, a consistent system of m equations in n unknowns, 
with m < n, has insufficient constraints on the unknowns to determine 
them uniquely; that is, it has an infinite solution set. 


1.3 Solving systems 


We now introduce a systematic method for solving systems of linear 
equations. This method is called Gauss-Jordan elimination. It entails 
successively transforming a system into simpler systems, in such a way 
that the solution set remains unchanged. The process ends when the 
solutions can be determined easily. You will meet this method again in 
Section 2, where you will use matrices to represent systems of linear 
equations. A strategy for solving systems of linear equations using 
Gauss-Jordan elimination is given there. 


1 Systems of linear equations 


The Gauss—Jordan elimination method was introduced by the 
geodesist Wilhelm Jordan (1842-1899) in the third edition of his 
Handbuch der Vermessungskunde (Handbook of Surveying) in 1888. In 
the same year the rather more obscure Luxembourg mathematician 
turned abbot Bernard Isidore Clasen (1829-1902) independently 
described the method, but his work did not become widely known. 
The method’s association with Carl Friedrich Gauss (1777-1855) is 
due to the fact that it can be regarded as a modification of the 
method of Gaussian elimination. Wilhelm Jordan is not to be 
confused with the algebraist Camille Jordan (1838-1922). 


The idea of Gauss-Jordan elimination is to reduce the number of 
unknowns in each equation. In general, we use the first equation to 
eliminate the first unknown from all the other equations, then use the 
second equation to eliminate another unknown (usually the second) from 
all the other equations, and so on. The actual order in which the 
unknowns are eliminated is flexible; however, it is sensible, at least 
initially, to proceed in order to avoid making mistakes. 


Wilhelm Jordan 


To avoid confusion when applying this method, we label the current 
equations r1, r2, and so on. This notation will be used in Section 2 where 
we transform rows of matrices, hence the choice of the letter r. 


We can then write down how we are transforming the preceding system to 
obtain the current (simpler) system. We use the symbol + (‘interchanges 
with’) to indicate that two equations are to be interchanged; for example, 
rı © rə means that the first and second equations are interchanged. We 
use the symbol —> (‘goes to’) to show how an equation is to be 
transformed. For example, rə + ro + rı means that the second equation of 
the system is transformed by adding the first equation to it. 


We start by illustrating this method with a system of two linear equations 
in two unknowns. Although this method is not the simplest way of solving 
this particular system, it proves very useful in solving more complicated 
systems. It is important that the operations we perform do not alter the 
solution set of the system. 


Worked Exercise Cl 


Solve the following system of two linear equations in two unknowns. 


2x + 4y = 10 
4dr + y=6 


13 


Unit C1 Linear equations and matrices 


Solution 


@. We aim to simplify the system by eliminating the unknown y, or 
y-term, from the first equation and the unknown z, or x-term, from 
the second; that is, we aim to obtain equations of the form x = x, 
y = *, where the asterisks denote numbers to be determined. .© 


We label the two equations of the system. 
Di 2x + 4y = 10 
19 4r+ y=6 

We simplify the first equation. 


@. We divide it through by 2, so that the coefficient of x is equal 
to 1. & 


rı > iri z+2y=5 
Ami a) 
®. At each step, we relabel (implicitly) the equations of the current 
system. These two equations therefore become the new rı and ro. ® 
We then eliminate the x-term in the second equation. 
@. We subtract 4 times the first equation from the second. © 
xt 2y =5 
r2 > ro — 4rı — Ty = —14 
We now simplify the second equation of this new system. 


®. We divide it through by —7; this yields a system which already 
looks less complicated than the original system, but has the same 
solution set. © 
x+2y=5 
ia) —tre y = 2 
Next we eliminate the y-term from the first equation. 
®. We subtract twice the second equation from the first. .© 
rı > rı — 2r2 g=] 
y=2 
We conclude that there is a unique solution: « = 1, y = 2. 


@®. As a check: we substitute x = 1 and y = 2 in the original system, 
using the abbreviation LHS for the left-hand side of the original 
equations and RHS for the right-hand side: 

LAS = 2x1) (4 2) = 10 = RHS y 

Lis = (4x1) 2) = = RES. N 


14 


The steps we performed in the worked exercise above involve either 
multiplying (or dividing) an equation by a non-zero number, or changing 
one equation by adding (or subtracting) a multiple of another. Neither of 
these operations alters the solution set of the system. Changing the order 
in which we write down the equations also does not alter the solution set 
of the system. These are the three operations, called elementary 
operations, that we perform to simplify a system of linear equations when 
using the method of Gauss—Jordan elimination. 


Elementary operations 


The following operations do not change the solution set of a system of 
linear equations. 


1. Interchange two equations. 
2. Multiply an equation by a non-zero number. 


3. Change one equation by adding to it a multiple of another. 


Operation 2 includes division by a non-zero number, and operation 3 
includes subtracting a multiple of one equation from another. 


In symbols we represent these three elementary operations by 
ri © rj, ry ari, and r; —r;+ prj, 


respectively, where a, 3 are non-zero numbers. 


Exercise C3 


Perform elementary operations, as in Worked Exercise C1, to solve the 
following system of two linear equations in two unknowns. 


ety=4 
2x —-y=5d 


We now solve a system of three linear equations in three unknowns. We 
use elementary operations to try to reduce the system to the following 
form, where again the asterisks denote numbers to be determined. 


Toe 
y=* 
Z=* 


1 Systems of linear equations 


15 


Unit C1 Linear equations and matrices 


16 


Worked Exercise C2 


Solve the following system of three linear equations in three unknowns. 
r+ y+2z=3 
2x + 2y+3z=5 
Z- y =5 


Solution 


We label the three equations and apply elementary operations to 
simplify the system. 


rı r+ y+2z=3 
r2 Bae se 2) sp Be = 
r3 w= p = 5 


@. We eliminate the x-term from the second and third equations: we 
subtract twice the first equation from the second, and then subtract 
the first from the third. © 


r+ y+ 2z=3 
r2 > r2 — 27) —z = —1 
Poe — 1) — 2y — 2z =2 


@. We now have no y-term in the second equation, and so cannot use 
this equation to eliminate the y-term from the first and third 
equations. We also cannot use the first equation to eliminate the 
y-term from the third equation, as this would reintroduce an x-term. 
However, we can use the third equation to eliminate the y-term from 
the first equation. To keep the terms in order we interchange the 
second and third equations — this is not strictly necessary, but helps 
keep things in order. ® 


r+ y+ 2z=3 
ro > r3 —2Qy—-— 2z=2 
=f ==] 


@®. We simplify this new second equation by dividing it through 
by —2. # 


est Qe=3 
ry > —4r> QAP z=—-l 
—z=-1l1 


®. We eliminate the y-term from the first equation by subtracting the 
second equation from the first. .©@ 


ify 2 ry = ie as el 
Yar gel 
=z = = 


Exercise C4 


Solve the following system of three linear equations in three unknowns. 
r+ y- z=8 
2e-—- yt z= l 
=T + 3y + 2z = —8 


Each system solved so far in this subsection has a unique solution. We now 
show how to apply the method to a system that does not have a unique 
solution. 


It is not usually possible to reduce a system with an infinite solution set to 
one where each equation contains just one unknown. This is illustrated by 
the following worked exercise. 


Worked Exercise C3 


Solve the following system of three linear equations in three unknowns. 
t + 2y = 0 
y-z= 2 
r+ ytz=-2 


1 Systems of linear equations 


17 


Unit C1 Linear equations and matrices 


18 


Solution 


We label the three equations and apply elementary operations to 
simplify the system. 


Yr Go, cy 0 
r2 y=: =? 
r3 r+ y+z=-2 


@. We eliminate the x-term from r3 using rı. ® 


x+ 2y = 0) 
Yr eS 
ips, =y ia = TP} —y +z = -2 


®. We eliminate the y-terms from rı and r3 using ro. © 


rı > rı — 2r2 L + 2z = —4 
y= z=2 
r3 > r3 +r2 Oke = Oy ae Oe = 0 


®. The current r3 equation gives no constraints on z, y and z: any 
values for x, y and z satisfy it. 


If we were to try to use equation rə to eliminate the z-term from rj, 
we would introduce a y-term. Similarly, using equation rı to eliminate 
the z-term from the equation rə would reintroduce an 2-term. .& 


There are insufficient constraints on the unknowns to determine them 
uniquely; so the system has an infinite solution set. 


®. We have two equations, one (x = —4 — 2z) relating the unknowns 
x and z, and the other (y = 2 + z) relating y and z. 


As each equation involves a z-term, we can choose any value we wish 

for z and use the equations to find the corresponding values for x and 
y in terms of this value for z. We set z equal to the real parameter k 

to get a general solution. © 


We write the general solution as 


a=-4-2k, y=2+k, z=k, KER. 


In the worked exercise above the equation r3 was written as 

Ox + Oy + 0z = 0 to highlight the fact that all the coefficients are zero — in 
future we will simply write the equivalent equation 0 = 0. In this case, the 
original equation r3 did not give rise to any additional constraints not 
already given by rı and ro. 


Whenever the simplification results in an equation 0 = 0, we have, in 
effect, reduced the number of equations. We simplify the remaining 
equations as far as possible, in order to determine the solution set. 


Exercise C5 


Solve the following system of three linear equations in three unknowns. 


zr +3y— z=4 
—r + 2y — 4z = 6 
æ+ 2y =2 


We now try to solve an inconsistent system. 


Worked Exercise C4 


Solve the following system of three linear equations in three unknowns. 
z + 2y +4z=6 

y+ z=1 

x + 3y + 5z = 10 


Solution 


We label the three equations and apply elementary operations to 
simplify the system. 


rı T 2yj ae le = 6 
r2 al 
r3 a2 4p w sp Oz = 10 


@. We eliminate the x-term from r3 using rı. © 


xr + 2y +4z=6 
y+ z=1 
er = Tey — i y+ z=4 


®. Comparing r2 and r3, we can conclude at this point that the 
system is inconsistent or we can carry out one further step to 
eliminate the y-terms from rı and r3 using ro. © 


rı > rı — 2ro x +2z=4 
y+ z=1 
Vee) ze e | ip) Q=3 


®. Concentrating on the current r3 equation (0 = 3), we see that 
there are no values of x, y and z that satisfy it. This system has no 
solutions. .& 


This system of linear equations is inconsistent: the solution set is the 
empty set. 


1 Systems of linear equations 


19 


Unit C1 Linear equations and matrices 


20 


Whenever the simplification results in an equation 0 = *, where the 
asterisk « denotes a non-zero number, we have an inconsistent system, 
since such an equation has no solutions. There is no point in simplifying 
the remaining equations further. As indicated in Worked Exercise C4, 
inconsistency of the system could have been inferred at the penultimate 
stage, as the equations y+ z = 1 and y+ z = 4 form an inconsistent 
system. 


Exercise C6 


Solve the following system of three linear equations in three unknowns. 


c+yt+t z=6 
—x+y-—3z=-2 
22 +y+3z=6 


1.4 Applications 


Systems of linear equations frequently arise when we use mathematics to 
solve problems from both within mathematics and outside it. 


The following worked exercise illustrates how linear equations can be used 
to find the equation of a plane through three given points. 


Worked Exercise C5 


Determine the equation of the plane that contains the three points (1, 3,1), 
(1,5,2) and (2,2,1). 


1 Systems of linear equations 


@. We eliminate the a-term from rə and r3 using rı. © 


a+3b+c=d 
Tes) ae Tey = TS) 2b+c=0 
r3 > r3 — 2r; — 4b- c= -d 
@. We simplify ro. ® 
Cao C= u 
ro > $rv b+ 4c=0 
= = e= =ü 


®. We eliminate the b-term from rı and r3 using ro. ® 


rı > rı — 3r2 a - ġc=d 
b+3c=0 
r3 > r3 + 4r2 c=-d 


®. We eliminate the c-term from rı and rz using r3. ® 


rı > rı + $r3 a=sd 
r2 > ro — 573 b= ¿d 
c=-d 


We conclude that this system has a unique solution (in terms of d): 
= 5d, D= 5d, c= d. 


We substitute these expressions into the equation of the plane to get 
idx + dy — dz = d. 
Multiplying through by 2 and dividing through by d yields a simpler 
equation for the plane: 
6 =F Y= 22 = 2, 


@®. As a check: we substitute the coordinates of each of the three 
points into this equation for the plane 
LHS = (Us Wee x3) = (2 x 1) = 2 = Ris 
LHS =(1x1)4+(1x5)—=@x2)=2=RES,7 © 
LHS = (1 x 2) + (1x2) — (2x 1) =2 = RS. ¥ 


Exercise C7 


Determine the equation of the plane that contains the three points (1,0, 2), 
(0,3,4) and (1, 1,3). 


The final exercise in this section uses systems of linear equations to solve a 
different type of problem. The idea is to use the information given to write 
down two linear equations that simultaneously hold, and then to solve 
these to answer the question. 


21 


Unit C1 Linear equations and matrices 


22 


Exercise C8 


The sum of the ages of my sister and my brother is 40 years. My brother is 
12 years older than my sister. How old is my sister? 


Because Gauss—Jordan elimination is a systematic method for solving 
systems of linear equations, it is straightforward to automate. Hence 
large systems of linear equations involving many variables can be 
easily solved using computers. Such systems are used in some 
methods of weather forecasting, as well as systems of non-linear 
equations. Gauss—Jordan elimination also arises in coding theory, 
which underpins digital communication and data transmission. 


2 Row-reduction 


In this section you will see how the method of Gauss—Jordan elimination 
can be applied using matrices, and that it can be formalised into a strategy 
that can be followed step by step. This method makes it easy to solve even 
quite large systems of linear equations. It involves a technique 
(row-reduction) that will be useful in another context later in this unit 
when we look at inverses of matrices. 


2.1 Augmented matrices 


We begin by using matrices as an abbreviated notation for a system of 
linear equations. 


A matrix is simply a rectangular array of objects, usually numbers, 
enclosed in brackets; in this module we use round brackets for matrices, 
although some texts use square ones. 


The objects in a matrix are called its entries. The entries along a 
horizontal line form a row, and those down a vertical line form a column. 
A matrix with m rows and n columns is an m x n matrix, and we say that 
it is a matrix of size m x n. 


A zero row of a matrix is a row comprising entirely of zeros, and a 

non-zero row has at least one non-zero entry. The first non-zero entry in 
a row (reading from left to right) of a matrix is the leading entry of that 
row; when such an entry in a row is the number 1, it is called a leading 1. 


Here are some examples of matrices with some entries highlighted as 
explained below: 


J 
A d 3.17 223 7.05 0.00 : 7 
2 P 4.88 1.71 1.72 5.55)’ 
0 0 0 0 B 


The entries in the first row of the first matrix above are 2 and —7; the 
entries in the second column of the second matrix above are 2.23 and 1.71; 
the 1 in the second row of the third matrix is a leading 1, and the —5 in 
the third row of this matrix is a leading entry. 


We can abbreviate a system of linear equations by writing its coefficients 
and constants in the form of a matrix. For example, the system 
4g + y=-T 
g= 3y=0 


can be abbreviated as 


4 1/-7 
1 -3| O/° 
It is helpful to draw the vertical line separating the coefficients of the 


unknowns on the left-hand sides of the equations from the constants on the 
right-hand sides. 


In general, the system 


1121 + aizo +++ + Qintn = b1 
G12, + ag2%g + +++ + Amn = b2 


Omi li + Opole + `- + amnin = bm 


of m linear equations in n unknowns z1, %2,..., £n is abbreviated as the 
matrix 

aii a2 ain | by 

a21 Q2 a2n | be 

AmI Am2 ‘** Amn bm 


This matrix is called the augmented matrix of the system. The word 
augmented reflects the fact that it is made up of a matrix formed by the 
coefficients of the unknowns on the left-hand sides of the equations, 
augmented by a matrix (or column vector) formed by the constants on the 
right-hand sides. Later, we will sometimes consider these two matrices 
separately. 


In the augmented matrix each row corresponds to an equation, and each 
column (except the last) corresponds to an unknown, in the sense that it 
contains all the coefficients of that unknown from the various equations. 
The last column corresponds to the right-hand sides of the equations. 


Worked Exercise C6 


Write down the augmented matrix of the following system of linear 
equations. 


i +10z=5 


3r + y-— 4z=-1 
4g —2y+ 6z=1 


2 Row-reduction 


23 


Unit C1 Linear equations and matrices 


Worked Exercise C7 


Write down the system of linear equations corresponding to the following 
augmented matrix, given that the unknowns are, in order, 21, £2. 


1 -—2)5 
0 149 
4 30 


Exercise C9 


(a) Write down the augmented matrix of the following system of linear 


equations. 
Aa, = 2x9 =- 
t+ 3z3 = 0 
— 3272+ 73=3 


(b) Write down the system of linear equations corresponding to the 
following augmented matrix, given that the unknowns are, in order, 


L,Y, Z, W. 
2 3 0 7| 1 
© 1 -= Qj|=1 
1 0 3 -1 


24 


2.2 Elementary row operations 


When you used Gauss-Jordan elimination to solve a system of linear 
equations in Section 1, you worked directly with the system itself; but it is 
often easier to apply the same method to its abbreviated form, the 
augmented matrix. The three elementary operations on the equations of 
the system correspond exactly to three equivalent operations on the rows 
of its augmented matrix. 


Recall that the three elementary operations are as follows. 

1. Interchange two equations. 

2. Multiply an equation by a non-zero number. 

3. Change one equation by adding to it a multiple of another. 


These correspond to the following operations on the rows of the augmented 
matrix. 


Elementary row operations 
1. Interchange two rows. 
2. Multiply a row by a non-zero number. 


3. Change one row by adding to it a multiple of another. 


We call these operations the elementary row operations of types 1, 2 
and 3, respectively. 


The next worked exercise shows a system of linear equations solved by 
Gauss-Jordan elimination. In Worked Exercise C2 we solved this system 
by performing elementary operations on the system itself; here we perform 
the corresponding elementary row operations on the augmented matrix of 
the system. You can see that here we have less to write down at each stage. 


In this worked exercise, and elsewhere, we use the same notation for 
elementary row operations as we use for elementary operations (r; + rj, 
and so on). 


Worked Exercise C8 


Solve the following system of linear equations. 


e+ yt+t2z=3 
24+ 2y+3z=5 
Z- y =5 


Solution 


We perform a sequence of elementary row operations on the 
augmented matrix of the system. 


2 Row-reduction 


25 


Unit C1 Linear equations and matrices 


26 


®. The idea is to transform the augmented matrix into one of a 
system with the same solution set but whose solution set is easy to 
write down. © 


Gi 1 l 2) 
ro 2 2 32 
r3 1 =1 olg 

1 1 2 3 
rə > ro — 2rı 0 0 —1|—1 
r3 => r3 = Fj 0 -2 -2 2 


Ly) Se 1G 


| 
| 
eee 
| 
| 


oOo Ff 
l 
N 
l 
N 
N 


1 1 2 3 
0 1 1} -1 
0 0 -1|-1 
LO ae PT 16) 1 0 il 4 
0 1 1} -1 
0 0 -1|-1 
1 0 il 4 
0 il 1} -1 
VERY are o 0 0 il il 
1S) ze 1, eS 1 0 0 3 
O o — 18s 0 1 0| —2 
0 0 1 1 


The corresponding system i 


n 


a =3 
Yea 2 
B= I. 
The unique solution is x = 3, y= —2, z=1. 


It is important to appreciate the following point about elementary row 
operations. 


When a sequence of elementary row operations is performed on a matrix, 
each row operation in the sequence produces a new matrix, and the 
following row operation is then performed on that new matrix. For 
example, the working for the first two row operations in the solution to the 


worked exercise above should, strictly, be set out as follows. 


rı 1 1 2/3 
r2 2 2 3ļ5 
rs 1 -1 O15 

1 1 2 
to > ro — 2r1 0 0 -1/-1 

1 =l 0 

1 1 2 

0 0 =1|-=l 
rs r3= fi 0 -2 -2 


However, we often perform two or more row operations in one step, to save 
time. Whenever we do this, we must ensure that when a row is changed by 
one of these row operations, the new version of that row is used when 
performing later row operations. 


The easiest way to avoid difficulties is to perform two or more row 
operations in one step only if none of these row operations changes a row 
that is then used by another of these row operations. The above row 
operations rg > ro — 2rı and r3 > r3 — rı meet this criterion: the first 
changes only row 2, and the second does not involve row 2. In this module 
we perform two or more row operations in one step only if they meet this 
criterion. 


Row-sum check 


We end this subsection by describing a simple checking method that can 
be useful for picking up arithmetical errors when we perform a sequence of 
elementary row operations on a matrix by hand. 


To apply this method, we proceed as follows. To the right of each row of 
the initial matrix, we write down the sum of the entries in that row. 


rı 1 1 2|3\ 7 (=1+1+2+3) 
ro 2 2 3/5| 12 (=2+2+3+5) 
r3 1-1 0ļ|5/ 5 (=1-1+5) 


From then on, when performing elementary row operations, we treat this 
‘check column’ of numbers as if it were an extra column of the matrix, and 
perform the row operations on it. So the first step of the calculation in the 
solution to Worked Exercise C8 above would look as follows. 


1 1 2 3 7 
r3>r—-2r, [0 O -1|/-1] -2 (=12-2x/7) 
r3 > 3 — Ti 0 -2 -2 2 —2 (=5-7) 
At each step in the calculation, each entry in this extra column should still 


be the sum of the entries in the corresponding row. If this is not the case, 
then an error has been made. 


2 Row-reduction 


27 


Unit C1 Linear equations and matrices 


28 


2.3 Solving linear equations systematically 


We now describe a systematic method for solving a system of linear 
equations by Gauss—Jordan elimination. The method involves performing 
elementary row operations on the augmented matrix of the system. In fact 
you have already seen this method in action, in Worked Exercise C8. Here 
we detail the sequence of steps involved, setting out a general strategy. 


The strategy involves writing down the augmented matrix of the system of 
equations and then performing a sequence of elementary row operations 
that reduces it to a simpler form called row-reduced form. The system of 
equations corresponding to this row-reduced matrix has the same solution 
set as the original system but with the new system it is easy to work out 
what the solution set is. The process of reducing the matrix to 
row-reduced form is referred to as row-reduction. We start by describing 
what a row-reduced matrix looks like. 


Row-reduced matrices 


Here is an example of a row-reduced matrix. 


O (GE O Rooms 0 0 Ri 
0 0 Oj1 3 24 0 O 
0000 0 O/;1 0 
0000 0 0 0/1 
0000 0 0 0 0 0 


The entries of a row-reduced matrix have a staircase form which we have 
emphasised with a black line. All the entries below the staircase must be 0; 
the left-most entry on each line above the staircase must be a 1 and all the 
other entries in that column must be 0. The other entries above the 
staircase can be any numbers. 


The general form of a row-reduced matrix is illustrated below, where the 
entries not in a column containing a leading 1 are indicated with asterisks, 
and the fact that all the entries below the staircase are zero is indicated by 
the large zero. 


We can describe a row-reduced matrix more precisely by specifying that it 


must satisfy certain properties as in the definition below. 


Definition 


A row-reduced matrix is a matrix satisfying the following four 


properties. 


1. Any zero rows are at the bottom of the matrix. 


. Each non-zero row has a leading 1. 


2 
3. Each leading 1 is to the right of the leading 1 in the row above. 
4 


. Each leading 1 is the only non-zero entry in its column. 


Property 3 gives a row-reduced matrix its staircase form, and property 4 
ensures that the entries above and below a leading 1 are all 0. 


Here are some more examples of row-reduced matrices: 


104 70230 
014-8 0 20 1 
000 01 8 Of, |0 
000 00 01 0 
000 00 00 


However, none of the following matrices are row-reduced: 


1 0 
ee 01 01 
00 O], 
Ta 1 0 0 0 
0 0 


The first matrix is not row-reduced as it has neither property 1 nor 


3 
2 
0 
0 


O = © 


5 
0 
1 
0 


1 


0 
4 
0 


i 


oo 02 2 © 
OOS SO Oi 


0 


=. OO 


DO oo © — © 


ooordweo 


ooo oF WwW 


property 2. The second is not row-reduced as it does not have property 3; 


the third does not have property 4. 


Exercise C10 


Which of the following are row-reduced matrices? 


0 1 7 1 0 0 
(a) {1 0 2 (b) | 0 
0 0 0 0 


O IF oO 


0 1 
0 0 


(c) 


OOO G 
O QO O l 


ooro 


oro o 


=N NN 


2 Row-reduction 


29 


Unit C1 Linear equations and matrices 


30 


Finding solutions from row-reduced form 


Suppose we have been given a system of linear equations, and that we have 
written down its augmented matrix and performed a sequence of 
elementary row operations to reduce it to row-reduced form. We will 
describe these operations in detail shortly, but first we will consider how to 
find all the solutions from the row-reduced form. 


Unique solution 


Consider for example this row-reduced augmented matrix: 


1 0 0| 8 
0 1 0| 3 
0 0 1|—1 


Suppose the unknowns are 71, £2 and x3. Then the system of equations 
corresponding to this matrix is as follows. 
T1 =8 
T2 =3 
z3 = —1 
This system is already in solved form, and we can immediately write down 
the unique solution: 


zı = 8, T2 = 3, z3 = —1. 


Unique solution 


Whenever the original system of equations has a unique solution it 
can be written down directly from the row-reduced matrix. 


No solution 


Now consider this row-reduced augmented matrix: 


1 —6 0/0 
0 0 1/0 
0 0 0}1 
We write down the system of equations corresponding to the matrix. 
w = 6x2 = 0 
T3 = 0 
0=1 


This time we find that one of the equations is 0 = 1. This equation cannot 
hold, so it follows that the system of equations has no solutions; that is, 
the equations are inconsistent. 


No solution 


Whenever the original system of equations is inconsistent, 
row-reducing the augmented matrix yields a system that includes the 
equation 0 = 1. 


Infinitely many solutions 


Now consider this row-reduced augmented matrix: 


10 6/7 
0 1 -—4/2)° 


We can write down the system of equations corresponding to this 
row-reduced matrix, but this time the system does not immediately give us 
a solution nor does it include the equation 0 = 1. 


T1 + 6x3 = 7 
tq — 4x3 = 2 


In this case we rearrange each equation so that everything except the first 
term on the left is moved over to the right-hand side. The effect of this is 
to express each leading unknown, that is, each unknown that 
corresponds to a column containing a leading 1, in terms of the other 
unknowns, the non-leading unknowns. Here, x; and x2 are leading 
unknowns and 3 is the only non-leading unknown. 


21 = 7 — 6x3 
£2 = 2 + 4x3 


Having expressed the two leading unknowns, xı and x2, in terms of the 
non-leading unknown x3 we can then choose any value we like for x3 and 
the equations give us the corresponding values of x; and x2. So the system 
has infinitely many solutions — one for each choice of value of «3. If we set 
x3 = k (k € R), say, and substitute this into the expressions for xı and 22, 
then we have all the unknowns expressed in terms of the parameter k. The 
general solution of the system is therefore: 

Ly = 7—6k 

tq = 2 + 4k 

w=k (k = R). 


Infinitely many solutions 


Whenever the original system of equations has infinitely many 
solutions, the general solution can be determined by setting the 
non-leading unknowns equal to parameters and expressing all the 
unknowns in terms of these parameters. 


As we noted in Subsection 1.2, a system of linear equations must have one 
of these three possibilities: a unique solution, no solution, or infinitely 
many solutions. 


2 Row-reduction 


31 


Unit C1 Linear equations and matrices 


32 


Worked Exercise C9 


Solve the system corresponding to the following row-reduced augmented 
matrix. Assume that the unknowns are £1, £2, £3, £4, £5. 


10 20 5j 4 


0 1 —3 0 -1 2 

0 0 0 I 3l -7 

0 O 0 O 0 0 
Solution 


@. We write down the system of equations, ignoring the equation 
corresponding to the bottom row of the matrix since it is 
just 0 = 0. # 


The augmented matrix corresponds to the system 


Ly + 2x3 + 5x5 = 4 
£2 — 3x3 = y =? 
£4 + 3x45 = —7. 

®. The system does not immediately give us a solution, and so there 
is not a unique solution. Furthermore, there is no equation 0 = 1 and 
so the system is not inconsistent. Therefore it must be a system with 
infinitely many solutions. We express each leading unknown, 71, £2 
and 24, in terms of the non-leading unknowns, x3 and z5. © 


This system is equivalent to 

t= 4 — 223 — 545 

ma = 2 sb Sieg sb a 

£4 = —7— 325. 
@. We can choose any values for x3 and x5, and the equations will 
give us the corresponding values of 71, x2 and x4. So the system does 
have infinitely many solutions, one for each choice of values for x3 and 
x5. To obtain the general solution of the system, we set x3 and z5 
equal to parameters. .& 
Setting x3 = k and z5 = l, (k,l € R), we obtain the general solution 

Ly = 4 — Dig = 51 


z2 =24+3k+1 
BSk 
z4 = —7— 3l 


zrs=1 (k,lE€R). 


Exercise C11 


Solve the system corresponding to each of the following row-reduced 
augmented matrices. 


(a) Assume that the unknowns are z1, £2. 


(|) 


(b) Assume that the unknowns are 21, £2, £3. 


WIN w= 


1 0 6/0 
0 1 7/0 
0 0 0; 1 


(c) Assume that the unknowns are 21, £2, £3, £4, Xs. 


0 0 1-3 0| 8 
000 0 1) 11 
0 00 0 0} O 


(d) Assume that the unknowns are 71, £2, £3, £4. 


1 0 0 140 
0 1 0 4)3 
0 0 1 00 


Row-reduction strategy 


You have seen that, once the augmented matrix of a system of linear 
equations is reduced to row-reduced form, all the solutions of the system 
can easily be determined. You will now see that there is a systematic 
strategy for row-reducing any matrix using elementary row operations. 


The idea of the strategy is to take each row of the matrix in turn as the 
current row, starting with the first. With row 1 as the current row, we 
carry out four steps, then with row 2 as the current row we carry out the 
same four steps, and so on. In outline the steps are as follows. In step 1 we 
identify the column for the current row’s leading 1; in steps 2 and 3 we 
create a leading 1 in the current row. Finally in step 4 we make each entry 
above and below the leading 1 into a 0. 


2 Row-reduction 


33 


Unit C1 Linear equations and matrices 


34 


Strategy Cl 


To row-reduce a matrix using elementary row operations, carry out 
the following four steps, first with row 1 as the current row, then with 
row 2 as the current row, and so on, until 


e either every row has been the current row, 
e or step 1 is not possible. 


1. Select the first column from the left that has at least one non-zero 
entry in or below the current row. 


2. If the current row has a 0 in the selected column, interchange it 
with a row below that has a non-zero entry in that column. 


3. If the entry in the current row and the selected column is c, 
multiply the current row by 1/c to create a leading 1. 


4. Add suitable multiples of the current row to the other rows to 
make each entry above and below the leading 1 into a 0. 


The strategy is illustrated in the following worked exercise, where the 
selected rows and columns of the matrix are highlighted by shading. 


You do not need to include this level of detail in your solutions, or the 
shading, just the row operations and current matrix. It is always a good 
idea to include the check column to try to pick up any errors in arithmetic! 


Worked Exercise C10 


Use Strategy C1 to row-reduce the following matrix. 


24 -2 2 4 
36 -3 6 5 
2 1 -ll 2 6 
—1 1 10 -7 -2 


2 Row-reduction 


®. Steps 2 and 3 create a leading 1 in the current row. 


The current row does not have a 0 in the column selected: it has a 2, 
and so step 2 does not apply. In step 3 we multiply the current row by 
the reciprocal of 2; that is, by $. 


In fact, when using this strategy to row-reduce a matrix there is often 
nothing to be done in step 2. ® 


rı > jr | = á 1 2 5 
sE -3 6 5 | 17 
21-11 2 6] 0 
-1 1 10 -7 -2 1 


®@. Step 4 makes each entry above and below the current leading 1 
into a 0 by adding suitable multiples of the current row to the other 


rows. & 
1 2 —1 1 2 5 
rə > ro — 3r1 0 0 0 a =l 2 
r3 > r3 — 2rı 0 -3 -9 0 2 —10 
r4 >r4+ri 0 3 9 —6 0 6 


®. None of these row operations changes a row that is then used by 
another of these row operations, so they can be carried out in one go; 
in fact, this will always be the case for the row operations required in 
step 4. 


Row 2 is the current row. 


Step 1 identifies the column for the current row’s leading 1: 
column 2. ® 


rı 1 ee —1 Llo 2 5 
r2 0 © a =l 2 
r3 0 9 0 2 =10 
r4 a 9 6 0 6 


®. Steps 2 and 3 create a leading 1 in the current row. 


The current row does have a 0 in the column selected, so in step 2 we 
interchange it with a row below that has a non-zero entry in that 
column. We choose to interchange it with row 4, although it does not 
matter which row of these we use. ® 


1 ye — 1 1 2 5 
ror, 0 Pay 2 =6 0 6 
Oem 9 0 2 = 10 
Om 0 3 -1 2 


35 


Unit C1 Linear equations and matrices 


®. The current row has a 3 in the column selected, so in step 3 we 
multiply the current row by the reciprocal of 3; that is, z. A 


ig o 1 2 5 
r2 > $r 11 o o o 2 
0 -3 -9 0 2 | -10 
Of 0 3-1 2 


@. Step 4 makes each entry above and below the current leading 1 
into a 0 by adding suitable multiples of the current row to the other 


rows. ® 
rı > rı — 2r2 1 fom —7 5 2 1 
Om 1 3-2 0 2 
r3 > r3 + 3r2 o Ro 0 —6 2 —4 
opg 0O 3 =!1 2 


@. Row 3 is the current row. 


Step 1 identifies the column for the current row’s leading 1: 
column 4, .® 


rı 1 0 —7 Wi 2 il 
r2 0 1 34 l 2 
r3 © 0 Omm 2 —4 
r4 0 0 o —1 2 


®. Steps 2 and 3 create a leading 1 in the current row. 


Here, step 2 does not apply, and in step 3 we multiply the current row 


by the reciprocal of —6; that is, —4. @ 


1 0 -7 Ea 2\ 1 
o1 3 @A oj- 
r3 > —ġr3 00 0 1-3 | 2 
00 of -1/ 2 


@. Step 4 makes each entry above and below the current leading 1 
into a 0 by adding suitable multiples of the current row to the other 


rows. .& 
rı > rı — org 1 0 -7 0 H -7 
r2 > r2 + 2r3 ogi 0-2] 2 
1 2 
0 0 1 => 3 
Ea fa 3k 0 0 o o0 0 


36 


If the strategy has been carried out correctly, then the matrix will be in 
row-reduced form when we stop, as was the case in the worked exercise 
above. 


In general, when applying the strategy we stop either after every row has 
been the current row and had the four steps carried out, or when we find 
that step 1 is not possible, which happens when there are one or more zero 
rows at the bottom of the matrix. 


Exercise C12 


Use Strategy C1 to row-reduce the following matrices. 


1 5 1 4 5 -!1 
(a) 1 5 3 12 11 3 
3 15 -1 -4 3 -6 
—2 -10 1 2 -7 6 


0 -8 8 -14 


at fj =f 26 
(b) -1 8 -12 8 
2 8 0 24 
1 4 0 14 


Modifying the strategy 


The strategy for row-reducing a matrix works for any matrix and can 
easily be programmed on a computer. But sometimes when carrying it out 
by hand we can spot places where carrying out a different row operation 
will make the calculations easier. Suppose we are working with the matrix 
below and that we have completed the four steps with row 1 as the current 
row. Row 2 becomes the current row and step 1 identifies column 2 for the 
current row’s leading 1. 


rı 1 Rog 1 2 T 
TQ 0 4 | D Y 16 
r3 0349 16 


2 Row-reduction 


37 


Unit C1 Linear equations and matrices 


38 


Since the entry is not 0 there is nothing to be done in step 2. In step 3 we 
are now Officially supposed to multiply the current row by - in order to 
create a leading 1, but this will create inconvenient fractions as other 
entries in row 2. We can, however, spot a different row operation that will 
also create a leading 1 in the current row, but avoids creating fractions, 
namely re > ro — r3, since subtracting the 3 from the 4 will create a 
leading 1. So we perform this alternative row operation as an unofficial 
version of step 3 and this gives us the matrix below. 


1 Roe 1 2 7 
rg > rg —13 On 1 Bie 0 
0\3 4 9 16 
We now carry out step 4 as normal: 
Lie S a Bro 1 0 -2 8 7 
of 1 1 —2 0 
r3 > r3 — 3r2 0 0 1 15 16 


We then carry on with the third row as the current row. 


In general, if you are trying to reduce a matrix to row-reduced form, you 
can use any elementary row operation. Note that even so it can sometimes 
be impossible to avoid fractions. 


Until you are very familiar with row-reducing matrices, it is sensible to 
follow the systematic strategy very closely, considering modifications only 
at step 3. 


When modifying the strategy and trying to identify an alternative row 
operation, it is important not to use rows above the current row, as the 
following exercise illustrates. 


Exercise C13 


Consider the following matrix where row 2 is the current row. 


rı 1 3 1 2 T 
r2 0 4 5 7] 16 
r3 0 3 4 9/ 16 


Carry out the following row operation and explain why it is not an 
appropriate alternative operation for step 3. 


Yo ->s Ti 


When trying to choose an alternative row operation, rows below the 
current row can be used because the zeros at the beginning of these rows 
prevent them destroying the progress made so far. 


Uniqueness 


We have seen that there can be different ways to row-reduce a matrix. 
Whichever way you choose, you will always get the same answer. This is a 
consequence of the following theorem, which we state without proof. 


Theorem C1 


Every matrix has a unique row-reduced form. 


Putting it all together 


We now have all the techniques necessary for using Gauss—Jordan 
elimination to solve a system of linear equations using augmented matrices; 
we just need to put them all together as set out in the following strategy. 


Strategy C2 


To use Gauss—Jordan elimination to solve a given system of linear 
equations: 


1. form the augmented matrix 
2. row-reduce the augmented matrix to obtain its row-reduced form 


3. solve the simplified system of linear equations. 


Worked Exercise C11 


Use Strategy C2 to solve the following system of linear equations. 
3x + 5y — 12z=4 


t+ y =2 
2r + 3y — 4z=5 


2 Row-reduction 


39 


Unit C1 Linear equations and matrices 


@. We now carry on as usual, following the strategy. ©& 


il il 0 2 4 
fo > Tg — 3r) 0 2 -—12)-2]) -12 
r3 > r3 — 2rı omi —4 1 —2 

i 1 0 2 4 
r2 > ro 0 1 -6|-1]} —6 

0 1 —4 1/ —2 
ie Sp Le f il @ 6 3 10 

QO t =O] =|) =o 
Vay ze 1a — Te) 0) 0 2 2 4 

1 0 6 3 10 

o =6| =|) EG 
r3 > 573 0 S| i/y 2 
SSR S 6r3 i @ @ | =3 —2 
to tors O Lt © 5 6 

DOi 1 2 


®@. This matrix is in row-reduced form. © 
The corresponding system of equations is 


a ==. 
y = 5, 
eels 


Thus the solution is x = —3, y=5, z= 1. 


Exercise C14 


Use Strategy C2 to solve the following system of equations. 
£1 — 4xq — 403 + 844 + 625 = 2 
221 — 5x2 — 643 + 6%4 + 925 = 3 
221 + 4x9 + 9x4 + 275 = 0 


40 


3 Matrix operations 


In this section you will revise matrices and matrix operations such as 
matrix addition and matrix multiplication. You will also meet a useful 
operation called transposition. 


3.1 Matrix arithmetic 


Recall that a matrix with m rows and n columns is an m x n matrix. An 
n x n matrix is called a square matrix. 


In general, we write A or (a;;) to denote a matrix: 


a11 a12 Qin 
a21 a22 EE a2n 

A= ‘ i . . = (aij): 
Ami Am2 ‘** Amn 


We call the entry in the ith row and jth column of a matrix A the 

(i, j)-entry, and often denote it by a;j. Although matrices are usually 
distinguished in print by the use of bold typeface, when you handwrite 
them you do not need to underline them, unlike letters that represent 
vectors. 


Matrices are closely related to vectors represented in component form. In 
Unit Al you performed vector arithmetic on vectors in both R? and R, 
writing a vector in component form as a row vector. Such a row vector 
can be regarded, respectively, as a 1 x 2 matrix or a 1 x 3 matrix with real 
entries; the only difference is the lack of commas in the matrix 
representation. For example, consider the following 1 x 3 matrix and the 
corresponding row vector in R?: 


(1 2 3) and (1,2,3). 


A column vector is a vector with the components written vertically; such 
a vector in R? or R can be regarded as a matrix with real entries that has 
just a single column. For example, the following represents both a column 
vector in R? and the corresponding 3 x 1 matrix: 


1 
2 
3 


It should be clear from the context whether this object is a column vector, 
with a geometrical interpretation in RÌ, or a matrix with real entries. 


A matrix may have any size, m x n for any natural numbers m and n, 
although we usually write a 1 x 1 matrix without the brackets and identify 
it with its single entry. 


3 Matrix operations 


41 


Unit C1 Linear equations and matrices 


In this way, matrices can be regarded as a generalisation of vectors with 
equality, the zero matrix and the operations of addition and scalar 
multiplication defined similarly. Whereas for vectors we defined these in 
terms of the components, for matrices we define them in terms of the 
entries. The details are given in the box below. 


Matrix arithmetic 


Equality Two m x n matrices A and B are equal if all their 
corresponding entries agree. We write A = B. 


Zero matrix The m x n zero matrix Om,n is the m x n matrix in 
which all entries are 0. It is denoted by O when it is clear from 
the context which size is intended. 


Addition The sum of two m x n matrices A = (a;;) and B = (b;;) 
is the m x n matrix A + B = (aij + bij) obtained by adding the 
corresponding entries: 


Chae Giese =° Cin T Ow 

a21 +b21 a22 +b22 +++ = Gan + ban 
A+ B= . ; 

Ami + Daal Am2 T bm2 soo (hey SP Orun 


Addition of matrices of different sizes is not defined. 
Negatives The negative of an m x n matrix A = (a;;) is the 
m x n matrix obtained by taking the negatives of its entries: 
=A = (—aj;). 
Subtraction The difference of two m x n matrices A = (aij) and 
B = (6;;) is the m x n matrix A — B = (C — O) obtained by 
subtracting the corresponding entries: 


Cie Ole ba e Cip = Dip 

a21 — b21 agg — b22 +++ ~— Aan — bon 
ABE : : 

Ami — bmi Am2 — bm2 oao Crag Dear 


Subtraction of matrices of different sizes is not defined. 


Scalar multiplication The scalar multiple of an m x n matrix 
A = (aij) by a real number k is the m x n matrix obtained by 
multiplying each entry in turn by k. 


kay kay Soi kain 
kag, kaz + kazn 

Aa ee leas) 
[aie (Ailes, 9° [Rater 


42 


3 Matrix operations 


For example, consider the matrices below. The first pair are not equal 
because a pair of corresponding entries differ, and the second pair are not 
equal as they have different sizes: 


1 
123 121 i it 4 
eee, ane Gi a4() 


The following are all examples of zero matrices: 


00 --- 0 
i 0000 0 0 0 
op (0 9 0), {0 0 0 0f and ey, 18 
000 0 2... 

0 0 --- 0 


The next worked exercise illustrates matrix addition. In the subsequent 
exercises you are asked to evaluate the results of various matrix operations. 


Worked Exercise C12 


Evaluate the following matrix sums, where possible. 
1 2 


OEE) orl 


Solution 


® G a)+(o a0 3) 


(b) This sum is undefined since the matrices are of different sizes. 


Exercise C15 


Evaluate the following matrix sums, where possible. 


+G i) OG +a wa) 
1 
1 


1 2 2 2 0 6 -2 1 2 9 
(c) [1 O} + 3 1 (a) |1 8 2]/+ ]1 0 4 
4 1 —2 4 5 0 3 4 3 —4 1 
Exercise C16 
Evaluate the following matrix differences, where possible. 
(a) € J 7 : (b) G 8 ‘) z (i 10 3) 
2 7 15 12 7 2 -1 4 9 21 


43 


Unit C1 Linear equations and matrices 


44 


Exercise C17 


Let 
5 —3 2 1 
A= 2 3 and B={-2 -7 
—1 0 3 5 


Evaluate the following. 
(a) 4A (b) 4B (c) 4A +4B (d) 4(A +B) 


In Exercise C15 you should have found that parts (a) and (b) gave the 
same answer; this is because matrix addition is commutative. In fact, 
matrix addition has the same properties as the additive properties of the 
real numbers, R, given in Unit A2 Number systems. Before listing these 
properties, we need the following notation: 


Mm,n denotes the set of all m x n matrices with real entries. 


We can now talk about arithmetic in Mm» and the properties it satisfies. 


Addition in Mm,n 

A1 Closure For all A,B € My n, 
A+ Be Mn: 

A2 Associativity For all A,B,C € Mm n, 
(A+B)+C=A+(B+C). 

A3 Additive identity For all A E€ My» and 0 E€ Mmn, 
A+0=A=0+A. 


A4 Additive inverses For each A € Mm,n, there is a 
matrix —A E€ Mm,n such that 


A+(-A)=0=-A+A. 
A5 Commutativity For all A,B € Minn, 
A+B=B+A. 


The matrix 0 is known as the additive identity in Mm,n, and the 
matrix —A in property A4 is known as the additive inverse of A. 


These properties follow from the definition of matrix addition and the 
corresponding properties of the reals. The next worked exercise proves the 
closure property (A1) and the commutative property (A5); you are asked 
to prove the remaining properties in the following exercise. 


Worked Exercise C13 


By using the corresponding properties for the reals, prove that the 
following properties hold for Mm,n under addition. 


(a) The closure property (A1): A+B € Mm,n. 
(b) The commutative property (A5): A+B=B+A. 


Exercise C18 


By using the corresponding properties for the reals, prove that the 
following properties hold for Mm,n under addition. 


(a) The associative property (A2): A+ (B + C) = (A +B) +C. 
(b) The identity property (A3): A+0=A=0+A. 
(c) The inverses property (A4): A + (—-A)=0=-A +A. 


Recall from Subsection 3.1 of Unit B1 Symmetry and groups that a set 
with a binary operation is a group if the following four axioms hold: 


G1 (closure), G2 (associativity), G3 (identity) and G4 (inverses). 


The first four properties (A1-A4) of matrix addition show that the set of 
all m x n matrices with real entries satisfies these four properties; that is, 
(Mm,n, +) is a group with additive identity the zero matrix 0, and —A the 
additive inverse of A. The final property (A5) shows that it is in fact an 
abelian group. 


3 Matrix operations 


45 


Unit C1 Linear equations and matrices 


46 


Although this unit concentrates on Mm», the set of matrices with real 
entries, some other sets of m x n matrices also form a group under 
addition. For example, the set of m x n matrices with entries in Z, and 
those with entries in C, both form a group under addition. However, the 
set of m x n matrices with entries in N does not form a group under 
addition, since this set of matrices contains neither the zero matrix 0, nor 
the additive inverse —A of a matrix A in the set. 


Finally in this subsection we return to scalar multiplication of matrices. 
Recall, from Unit A2, that the reals satisfy a distributive property (D1) 
combining addition and multiplication: 


ax (b+c)=(axb)+(axc), forall a,b,c ER. 


It turns out that the corresponding property holds for addition and scalar 
multiplication of matrices; you saw one example of this in Exercise C17(c) 
and (d) where 4(A + B) and 4A + 4B were equal. 


Combining addition and scalar multiplication in Mm,n 
D1 Distributivity For all A,B € Mm,n and k € R, 


k(A +B) = kA + KB. 


You are asked to prove that this property holds in the next exercise. 


Exercise C19 


By using the corresponding property for the reals, prove that the 
distributive property (D1) holds for Mm,n: 


k(A +B) = kA + kB. 


3.2 Matrix multiplication 


In the previous subsection you saw that matrix addition and scalar 
multiplication can be defined in terms of matrix entries, much like the 
corresponding operations for vectors. In this subsection you will revise 
matrix multiplication, which can also be defined in terms of matrix entries, 
much like the corresponding operation for vectors — the scalar product. 


Recall from Unit Al that the scalar product of two vectors a = (a1, a2, a3) 
and b = (bı, ba, b3) in R3 is 

a- b = abı + a2b2 + a3b3. 
Matrix multiplication is a generalisation of this idea. 


To form the product of two matrices A and B, we combine the rows of A 
with the columns of B. The (i, j)-entry of the product AB is obtained by 
multiplying each entry in the ith row of A by the corresponding entry in 
the jth column of B and adding the results. 


3 Matrix operations 


This product is only possible if the number of columns of A is equal to the 
number of rows of B. 


For example, let 


1 2 3 
A=(j : and B={4 5 6 
8 9 


The number of columns of A is equal to the number of rows of B, so it is 
possible to find the product AB. 


To obtain the (1,1)-entry of the product AB we combine the first row of 
A with the first column of B: 


(1x1) +(2x4)+(3x 7) =14+8+421 = 30. 


Next, to obtain the (1, 2)-entry of the product AB, we combine the first 
row of A with the second column of B: 


(1 x 2) + (2 x 5) + (3 x 8) = 2 + 10 + 24 = 36. 


Then to obtain the (1,3)-entry of the product AB, we combine the first 
row of A with the third column of B: 


(1 x 3) + (2 x 6) + (3 x 9) = 3 + 12 +27 = 42. 


To obtain the entries in the second row of the product AB, we combine 
the second row of A with each of the columns of B in turn. 


In the end we obtain 2 x 3 entries in the product AB; this matrix has 2 
rows and 3 columns as follows. 


12 3 tse) p30 36 42 
456 ~ \66 81 96 


roe 


7 8 9 @ = ee o 
. š ë . 7 E e e e 
One way to remember how to multiply matrices A and B is to picture 
running along the rows of A and then diving down the columns of B. The °°? 
example pictured in Figure 13 gives the (1, 2)-entry. Figure 13 Running along and 
diving in 


Definition 

The product of an m x n matrix A with an n x p matrix B is the 
m x p matrix AB whose (i, j)-entry is obtained by multiplying each 
entry in the ith row of A by the corresponding entry in the jth 
column of B and adding the results. 


In symbols, if C = AB, then 
Cig = Cbig + Q:2b25 +--+ + Qindn;- 


The product AB is not defined when the number of columns of the 
matrix A is not equal to the number of rows of the matrix B. 


47 


Unit C1 Linear equations and matrices 


Schematically, this can be shown as follows. 


<—N—> js ~—p—> 


row i column j (i, 7)-entry 


Worked Exercise C14 


Evaluate (where possible) the matrix products AB, where: 


(a) A=(3 7 ga B=(i < ) 


Solution 


(a) ®. The matrix A has 2 columns and the matrix B has 2 rows, so 
the product AB can be formed. ® 


The product of a 2 x 2 matrix with a 2 x 3 one is a 2 x 3 matrix. 


®. When evaluating a product of matrices, it is advisable to find 
the entries systematically, either row by row, or column by 
column. Here, we find the entries row by row. 


To find the (1, 1)-entry of AB, we multiply each entry in the first 
row of A by the corresponding entry in the first column of B: 


(2x 3)+(1x1)=7. 


Next, to find the (1, 2)-entry of AB, we apply the same 
procedure to the first row of A and the second column of B: 


Cx?) ih 


Next, to find the (1,3)-entry of AB, we apply the same 
procedure to the first row of A and the third column of B: 


(2x 0)+(1x 4) =4. 
Together, these give the first row of AB: © 


T -3 4 
( x * Ok ) 
®. We continue by finding the (2,1)-entry of AB then the 
(2, 2)-entry and finally the (2,3)-entry, by applying the same 
procedure to the second row of A with the columns of B in turn. 
This gives the second row of the product AB: .©& 


48 


Exercise C20 


Evaluate the following matrix products, where possible. 
2 -1 


ob 0) eer) o HEA 


@ G)@ 0-4) ofi 1 


=1 


In the previous subsection you saw that addition on the set Mm,n of m x n 
matrices satisfies the usual properties (Al—A5) for addition. For 
multiplication of matrices things are not so straightforward. To start with, 
if m Æ n then the product of two matrices in the set Mm n is not even 
defined. 


So when we consider properties of matrix multiplication we are only 
interested in products that can be formed. For example, we can say that 
matrix multiplication is associative because, whenever these products can 
be formed, (AB)C = A(BC). You will prove this result in Unit C3. 


In the next exercise you are asked to prove that matrix multiplication is 
not commutative. 


Exercise C21 


(a) Prove that the products AB and BA are the same size if and only 
if A and B are square matrices of the same size. 


(b) Prove that matrix multiplication of square matrices of the same size is 
not commutative by giving a counterexample; that is, find two 2 x 2 
matrices A and B such that AB # BA. 


The fact that matrix multiplication is not commutative means that it is 
important to describe a matrix product carefully. We say that AB is the 
matrix A multiplied on the right by the matrix B, or the matrix B 
multiplied on the left by the matrix A. 


3 Matrix operations 


49 


Unit C1 Linear equations and matrices 


50 


You have seen that the distributive property (D1) holds for multiplication 
of a matrix by a scalar. Matrix multiplication is also distributive because, 
whenever these products can be formed, A(B + C) = AB + AC. The proof 
of this is not hard, but it is not very illuminating, so is not given here. 


Diagonal and triangular matrices 


The entries of a square matrix from the top left-hand corner to the bottom 
right-hand corner are the diagonal entries; the diagonal entries form the 
main diagonal of the matrix. In some texts the main diagonal is called 
the leading or principal diagonal. For a square matrix A = (aij) of size 

n x n, the diagonal entries are 


Q11, 422, «++, Ann: 


A matrix that has its only non-zero entries on the main diagonal can be 
useful. 


Definition 


A diagonal matrix is a square matrix each of whose non-diagonal 
entries is zero. 


For example, the following are diagonal matrices: 


3 0 0 
G e and 0 —7 0 
0 0 0 


To see how diagonal matrices multiply, try the following exercise. 


Exercise C22 


1 0 —3 0 2 0 
ht, Bel, 4) SSS 


Evaluate the following products. 
(a) AB (b) BA (c) ABC 


Let 


The product of two diagonal matrices is another diagonal matrix, and the 
ith diagonal entry of the product is the product of the ith diagonal entries 
of the matrices being multiplied. Multiplication of diagonal matrices is 
therefore commutative. 


More generally, positive powers of square matrices are defined as 
expected: 


A? = AA, A? =AAA, 


Therefore, finding powers of diagonal matrices is straightforward and you 
will see how this fact can be used to find powers of other square matrices 
in Unit C4 Eigenvectors. 


A square matrix with each entry below the main diagonal equal to zero is 
called an upper triangular matrix. Similarly, a square matrix with each 
entry above the main diagonal equal to zero is called a lower triangular 
matrix. A square row-reduced matrix is an upper triangular matrix. A 
square matrix that is both upper triangular and lower triangular is 
necessarily a diagonal matrix. 


Exercise C23 


State which of the following matrices are diagonal, upper triangular or 
lower triangular. 


k a 0 0 1 
9 0 1 0 
(a) [0 2 2 (b) (c) [0 1 2 (d) 
0 0 3 (c o) 1 2 3 (; o) 


Identity matrix 


You have seen that there are matrices corresponding to the number 0, 
which is the additive identity in the reals. These matrices are the zero 
matrices Om,n, each of which is the additive identity in Mn. There are 
also matrices corresponding to the number 1, which is the multiplicative 
identity in the reals. These matrices are square matrices called the identity 
matrices, denoted by Ip. The subscript n indicates that the matrix is an 

n x n matrix; however, as with the zero matrix, the identity matrix is 
written simply as I when the size is clear from the context. 


Definition 


The identity matrix I, is the n x n matrix 


I O so5 © © 
0 T css 0 @ 
O O se» i Q 
0 @ es Q i 


Each of the entries is 0 except those on the main diagonal, which are 
all 1. 


For example, the identity matrices Ip, I3 and I4 are 


1 0 0 0 
1 0 Ao 0 1 0 0 
0 1 loo (0010 
000 1 


3 Matrix operations 


51 


Unit C1 Linear equations and matrices 


If we multiply a 3 x 2 matrix on the left by Iz we obtain 


1 0 0 a b a b 
0 1 0 c d|/= tea 
00 1) \e f e f 


If we multiply the same 3 x 2 matrix on the right by Ip we obtain 


a b a b 
Ao t)=[e 4 


Cc 
0 1 
e f i 
Here, a, b, c, d, e and f are any real numbers. In both cases, the matrix is 
unchanged. 
Theorem C2 


Let A be an m X n matrix. Then 


InA = A = AL. 


You are asked to prove this theorem in the next exercise. 


Exercise C24 


Let A = (aj;) be an m x n matrix. Prove Theorem C2; that is, prove that 
I,,A = A and Alẹ, = A. 


Hint: Notice that the entries in the ith row of Im are all 0 except the entry 
in the ith position, which is 1. 


3.3 Transposition of matrices 


There is a simple operation that we can perform on matrices. This 
operation, called transposition or taking the transpose, entails 
interchanging the rows with the columns of the matrix. Thus the transpose 
of the matrix A, denoted by AT, has the rows of A as its columns, taken 
in the same order. For example, 


o ra /147 IA Aagi 
N 4 5 6] =|2 5 8 and —6 1 ae D 
7 7 3 4 3 6 9 0 4 


Transposition of a square matrix can be thought of as reflecting the matrix 


g in the main diagonal, as illustrated in Figure 14. 


Figure 14 Transposition as 
reflecting in the main diagonal Definition 


The transpose of an m x n matrix A is the n x m matrix AT whose 
(i, j)-entry is the (j, i)-entry of A. 


52 


Exercise C25 


Write down the transpose of each of the following matrices. 


1 4 2 1 2 
1 0 
(a) | 0 2 (b) [0 3 -$ (c) (10 4 6) (ad) 
—6 10 4 7 0 l ) ( ) 


The identity matrix I is not changed by taking the transpose; that is, 
I” =I. In fact, A? = A for all diagonal matrices; you saw one such 
example in Exercise C25(d). 


The operation of transposition has some other useful properties as you will 
now see. 


The rows of the matrix A form the columns of the matrix AT, and the 
columns of AT form the rows of (AT)T. Therefore the rows of A form the 
rows of (A7)?; that is, these two matrices are equal: 


(AT)T =A. 


Exercise C26 


Let 
1 2 7 8 1 0 
A=1{3 4], B=] 9 10 and g=] a 
5 6 11 12 


(a) Find A’, BT and (A +B)’, and verify that (A +B)? = AT + BT. 


(b) Find CT and (AC)’, and find an equation relating (AC), AT 
and CT. 


The relationships satisfied by the matrices in Exercise C26 hold in general. 


Properties of matrix transposition 

Let A and B be m x n matrices. Then: 

l (A sA 

2. (A+B)? =A 1B, 

Let A be an m x n matrix and B an n x p matrix. Then 
3 (AB) = BAT. 


3 Matrix operations 


53 


Unit C1 Linear equations and matrices 


54 


Symmetric matrices 


Some square matrices remain unchanged when transposed. These matrices 
are called symmetric matrices, since they are symmetrical about the main 
diagonal. 


Definition 


A square matrix A is symmetric if AT = A. 


Since A? = A for all diagonal matrices, all diagonal matrices are 
symmetric. Here are other examples of symmetric matrices: 


123 4 
Te ri 256 7 —5 2 
113) Mis [3 6 8 9f’ 2 3J 
4 7 9 10 


3.4 Matrix form of a system of linear 
equations 


In this subsection you will see how a system of linear equations can be 
expressed in matrix form as a product of matrices. This contrasts with the 
augmented matrices you met in Subsection 2.1, which are an abbreviated 
notation for the system and involve no products of matrices. 


Consider the following system of linear equations. 


r1 + 2x2 + 4z3 = 6 
Tit £3 > 1 
xı + 3x2 + 5x3 = 10 


We can write this system as a matrix equation: 


£1 + 2x9 + 4x3 6 
z2 + z3 | = 1 
zı + 3x2 + 523 10 


Now the 3 x 1 matrix on the left can be expressed as the product of two 
matrices, namely the 3 x 3 matrix of the coefficients and the 3 x 1 matrix 
of the unknowns: 


zı + 2% + 4x3 12 4 Ly 
t+ 73/=]0 1 1 T2 
xı + 3x2 + 523 1 3 5 T3 


Thus we have the matrix equation 


1 2 4 LY 6 
0 1 1 T2 z= 1 
1 35 T3 10 


Similarly, we can express any system of linear equations 


a11£1 + ai2£2 +: + ainTn = b1, 
a211 + a22%£2 +++: + A2mTn = bo, 


Am1X1 + Am2%2 + +: + Amnn = bm, 


as a matrix product. Let the matrix of coefficients be A, the coefficient 
matrix of the system, that is, 


a11 a12 Qin 

a21 Q22 a2n 
A= À 

Ami Am2 `° Amn 


Let the matrix of unknowns be x, and let the matrix of constant terms 
be b, so 


Ti by 
T2 bo 
x= and b=] . 


The system can then be expressed in matrix form as 


Ax = b, 
or in full as 
ai Q2 **" Gin £1 by 
a2 Q2 > Gan T2 b2 
Am1 Am2 °*** Amn Tn bm 


Writing a system of linear equations in matrix form will allow us to 
manipulate the system using matrix multiplication. 


4 Matrix inverses 


55 


Unit C1 Linear equations and matrices 


56 


4 Matrix inverses 


In this section you will investigate the multiplicative properties of square 
matrices and the existence of multiplicative inverses. 


4.1 Matrix inverses 


In Section 3 you saw that matrix addition in Mm, satisfies the usual 
properties (A1l—A5) for addition, but things are not so straightforward for 
multiplication of matrices. 


If we restrict our attention to the set Mn,n of square matrices with real 
entries, then products of these matrices can always be formed, and so the 
following properties hold. 


Multiplication in Mn,n 

M1 Closure For all A,B € Mn», 
AB € Mnn- 

M2 Associativity For all A, B,C € Mn», 
(AB)C = A(BC). 

M3 Multiplicative identity For all A € My», 
AI, = A=1,A. 


The closure property (M1) follows from the definition of matrix 
multiplication and the associative property (M2) will be proved in 

Unit C3. The multiplicative identity property (M3) holds by Theorem C2 
and we say that I, is the multiplicative identity in Mn. 


You saw that matrix multiplication is not commutative, even for square 
matrices, and so the commutative property (M5) does not hold for matrix 
multiplication in M,,,. The distributive property (D1) does hold for 
matrix addition and matrix multiplication in Mn,n; that is, matrix 
multiplication is distributive over matrix addition. However, because 
matrix multiplication is not commutative we have to consider multiplying 
on the right and left separately. 


Combining addition and multiplication in M,,,, 
D1 Distributivity For all A,B,C € Mnn, 


A(B+C)=AB+AC, 
and 


(A +B)C = AC + BC. 


You may have noticed that one other property is missing from the list of 


multiplicative properties, namely the multiplicative inverses property (M4). 


Recall, from Exercise C21(a), that the products AB and BA are the same 
size if and only if A and B are square matrices of the same size. 


We say that B is a multiplicative inverse of A in Mn,n if A,B € Mn,n 
and AB = I, = BA. In fact, because the additive inverse of a matrix is 
usually called the negative of the matrix, the multiplicative inverse is 
usually called the inverse of a matrix, where the context is clear. 


We now investigate the existence of multiplicative inverses. 


Many square matrices do have multiplicative inverses, for example, 


3 —-1\ . š f 2 1 
_5 9g) 18an inverse of |, , 


since 


and 
3 —1 2 1\_ /10 
—5 2 5 3) \o 1 
Similarly, 
—1 —5 -2 1 3 —1 
0 2 1 | is an inverse of | —2 —5 1 
—2 1 1 4 11 -2 
since 
1 3 1 1 5 2 1 0 
—2 —5 1 0 2 1]/={]0 1 0 
4 11 -2 —2 1 1 0 1 
and 
—1 —5 -2 1 3-1 1 0 0 
0 2 1 —2 —5 1]/=/]0 1 0 
—2 1 1 4 11 -2 0 0 1 


Just as a real number has at most one multiplicative inverse, or reciprocal, 
a square matrix has at most one inverse, as we now prove. 


Theorem C3 


If a square matrix has an inverse, then this inverse is unique. 


4 Matrix inverses 


57 


Unit C1 Linear equations and matrices 


58 


Proof Let A be asquare matrix, and suppose that B and C are both 
inverses of A. Then AB = I = BA and AC = I = CA. 


®. We consider the product CAB = C(AB) = (CA)B. # 

Multiplying the equation AB = I on the left by C, we have 
C(AB) = CI=C, 

while multiplying the equation CA = I on the right by B gives 
(CA)B = IB =B. 


Since matrix multiplication is associative, it follows that B = C. | 


Certainly a square zero matrix has no inverse (just as the real number 0 
has no reciprocal), since if 0 is a square zero matrix, then any product of 0 
and another matrix is a zero matrix, and so there is no matrix B such that 
OB =I. However, it is natural to ask whether or not every non-zero 
square matrix has an inverse. The next exercise demonstrates that the 
answer to this question is no: it gives an example of a non-zero square 
matrix with no inverse. 


Exercise C27 


1 -1 
Let A = (i a 
Prove that there is no matrix B = ( 


) such that AB = I. 


In fact, there are many non-zero square matrices with no inverse. The next 
theorem gives an infinite class of such matrices. 


Theorem C4 


A square matrix with a zero row has no inverse. 


Proof Let A be a square matrix, one of whose rows, say row i, is a zero 
row. Then if B is any matrix of the same size as A, the (i, i)-entry of AB 
is 0, since it is obtained by multiplying each entry in row i of A (a zero 
row) by the corresponding entry in column i of B. But the (i, i)-entry of I 
is 1, which shows that there is no matrix B such that AB =I. Hence A 
has no inverse. | 


Definition 


A square matrix that has an inverse is called invertible. 


An invertible matrix is necessarily a square matrix in order for the 
products AB and BA to exist and be equal. 


Since we know by Theorem C3 that if a matrix has an inverse, then this 
inverse is unique, we denote the unique inverse of an invertible matrix A 
by A7!. Thus, for any invertible matrix A, 


AA“ =I = AA. 


Both A and AW! are square matrices of the same size. It follows from 
these equations that if A is an invertible matrix, then A! is also 
invertible, with inverse A; that is, 


(At = A. 
In other words, the matrices A and AT! are inverses of each other. 


The next worked exercise and the following exercises give some other 
useful facts about inverses of matrices. 


Worked Exercise C15 


Let A be an invertible matrix. Prove that A” is invertible, and that 
{Ar} = (AFI 


Exercise C28 


Prove that the identity matrix I is invertible, and that I~! = I. 


Exercise C29 


Let A and B be invertible matrices of the same size. Prove that AB is 
invertible, and that (AB)~' = B-'A7!?. 


4 Matrix inverses 


59 


Unit C1 Linear equations and matrices 


Notice the reversal of the order of the matrices in the identity 
(AB) '=B AT. 


This result of Exercise C29 extends to products of any number of matrices; 
it can be proved using this result and mathematical induction. 


Theorem C5 


Let A1, A2,..., Ak be invertible matrices of the same size. Then the 
product A; A»--- Az, is invertible, with 


(Aq Age Ae Ae Ae Ae 


You saw in Subsection 3.1 that (Mm n,+), the set of all m x n matrices 
with real entries, forms a group under addition. The results of 

Exercises C28 and C29, together with the properties M1—M3 for matrix 
multiplication in Mn,n, can be used to show that the set of all invertible 
matrices of a particular size and with real entries forms a group under 
matrix multiplication. The restriction of the set to include only invertible 
matrices is important: without this, the axiom G4 (inverses) clearly fails 
since, for example, the zero matrix has no inverse. 


Theorem C6 


The set of all invertible n x n matrices with real entries forms a group 
under matrix multiplication. 


Proof We check the four group axioms. 


G1 Closure Exercise C29 showed that if A and B are invertible n x n 
matrices then their product AB is invertible. The product AB is an 
n x n matrix, so group axiom G1 (closure) holds for this set. 


G2 Associativity The associative property (M2) holds for matrix 
multiplication in Mn, n, so group axiom G2 (associativity) holds. 


G3 Identity The identity property (M3) holds for matrix multiplication 
in Mn,n, and Exercise C28 shows that I, is in the set of all invertible 
n x n matrices with real entries. Therefore group axiom G3 
(identity) holds. 


G4 Inverses_ The set of all invertible n x n matrices with real entries is 
a subset of My. By definition every matrix in the set of invertible 
matrices has an inverse, and this inverse is itself invertible and 
therefore in the set, so axiom G4 (inverses) holds. 


Hence the set of all invertible n x n matrices with real entries under the 
operation of matrix multiplication satisfies the four group axioms, and so 
is a group. a 


60 


4.2 Invertibility Theorem 


The following two questions may already have occurred to you as you 
worked through the previous subsection. First, how can we determine 
whether or not a given square matrix is invertible? Second, if we know 
that a matrix is invertible, how can we find its inverse? The next theorem, 
which we will prove in Subsection 4.5, answers both these questions. 


Theorem C7 Invertibility Theorem 
(a) A square matrix is invertible if and only if its row-reduced form 
is I. 


(b) Any sequence of elementary row operations that transforms a 
matrix A to I also transforms I to A™t. 


To illustrate this theorem, consider the matrix 


1 3 
A= G 5) | 
Suppose that we wish to determine whether or not A is invertible and, if it 


is, to find ATİ}. 


Below, on the left, we row-reduce A in the usual way. On the right, we 
perform the same sequence of elementary row operations on the 2 x 2 
identity matrix. 


rı 1 3 rı 1 0 
r2 2 9 r2 0 1 


1 3 1 0 

ro > r2 — 2r1 0 3 to > ro — 2r1 —2 1 

1 3 1 0 

rg— dro 0 1 ro => iro —3 + 
r> Oo 3r2 1 O ry Fy 3r2 3 —1 
0 1 = = 
3 3 


The row-reduced form of A is I, and so we conclude from the first part of 
the Invertibility Theorem that A is an invertible matrix. 


By the second part of the Invertibility Theorem, the final matrix on the 
right above must be AT}; that is, 


= 3 =] 
3 3 


You should check that this matrix is indeed the inverse of A. 


4 Matrix inverses 


61 


Unit C1 Linear equations and matrices 


62 


To apply the Invertibility Theorem to find the inverse of a matrix A, we 
have to perform the same sequence of elementary row operations on both 
A and I. We can do this conveniently in the following way. We begin by 
writing A and I alongside each other, separated by a vertical line, giving a 
larger matrix, which we may denote by (A | I). We then row-reduce 

(A | I) in the usual way (with the check column included). When we do 
this, the above calculation is as follows. 


rı 1 3| 1 0\ 5 
ro 2 9| 0 dy 12 
1 3| 1 05 

ro > ro — 2r, 0 3] -2 1 2 
e 3 1 r) 5 

ro > $82 0 1 -é - Z 
rı > rı — 3re ¢ ol 3 1) 3 
2 1 2 

0 1)-3 3/3 


Thus the Invertibility Theorem (Theorem C7) gives us the following 
strategy. 


Strategy C3 


To determine whether or not a given square matrix A is invertible, 
and to find its inverse if it is, do the following. 


Write down (A | I), and row-reduce it until the left half is in 
row-reduced form. 


e If the left half is the identity matrix, then the right half is A7!. 


e Otherwise, A is not invertible. 


You may find it helpful to remember the following scheme for this strategy: 
(A | I) 
i 
(cl Aj), 


Strategy C3 is most useful for matrices of size 3 x 3 and larger. In Section 5 
you will revise a quick method for determining whether or not a 2 x 2 
matrix is invertible, and for writing down its inverse if it is invertible. 


Worked Exercise C16 


Determine whether or not each of the following matrices is invertible, and 
find the inverse if it exists. 


11 2 135 
(a) A=[-1 0 —4 (b) B=(3 17 
3 2 10 248 


4 Matrix inverses 


Solution 


®. We use Strategy C3, and again add the row-sum check to help pick 
up any arithmetical errors. © 


(a) We row-reduce the matrix (A | I). 


Yi 1 1 2 1 0 0 5 
r2 =I 0 =4 0 1 0} —4 
r3 S 2 10) © 0 il 16 
1 1 2 1 0 ON 95 
iby 2 Te ae eT 0 1 -2 1 1 0 il 
r3 > r3 — 3r, 0 —1 4| -3 0 1 il 
IE S Lis], S 15) 1 0 0 -l 0 4 
0 IL =2 1 1 Oy} i 
Teg pe 13 = IPR 0 0 2 | =2 1 Il 2 
1 0 4 Q = O\ 4 
0 il =2 1 1 O il 
r3 > 573 0 © 1;/-1 5 $/ 1 
Leith es Bale Ar3 1 0 0 4 —3 -2 0 
Ty) = Ley ap ZAP} 0 il © || ll 2 1} 3 
0 o a 5 $/ 1 


The left half has been reduced to I, so A is invertible; its inverse 
is given by the right half, that is, 


My 30 
(Ke a Po wil 
all 1 

=i 1 5 


2 
(b) We row-reduce (B | I). 


ry 1 3 5 1 0 O\ 10 
r2 3 il T 0 1 0 12 
ia 2 4 alo o 1 E 
1 3 5 il 0 0 10 
Po Eo efi To =a = | =s 1 0 e 
rg ig Ani pr) |S o ely 
1 3 5 1 0 0 10 
r2 > —gY2 O ea ge 
0 a ala 0 ees 
rı > rı — 3r2 il 0 9) —% 3 0 1 
0 1 1 - 4 0 
Psi hs erg 0 0 O;-3 -4 I) -3 


The left half is now in row-reduced form, but is not the identity 
matrix. Therefore B is not invertible. 


63 


Unit C1 Linear equations and matrices 


If, for a general matrix A, it becomes clear while you are 

row-reducing (A | I) that the left half will not reduce to the identity 
matrix (for example, if a zero row appears in the left half), then you can 
stop the row-reduction immediately, and conclude that A is not invertible. 
There is no point in continuing until the left half is in row-reduced form. 


Exercise C30 


Use Strategy C3 to determine whether or not each of the following 
matrices is invertible, and find the inverse if it exists. 


2 4 1 1 —4 2 4 6 
(a) A= a i) (b) B=] 2 1 -6 (c) C={1 2 4 
—3 -1 9 5 10 5 


4.3 Invertibility and systems of linear 
equations 


We can use matrix inverses to give us another method for solving certain 
systems of linear equations. 


Consider the system that we solved by Gauss-Jordan elimination in 
Worked Exercise C1. 

2x + 4y = 10 

4r+ y=6 


You saw in Subsection 3.4 that such systems may be expressed in matrix 
form as 


690-0) 


In Exercise C30(a) you found that this coefficient matrix is invertible: 


—1 1 2 

G 1) O T ') 
_ 2 1 E 

oo 7 7 


Multiplying both sides of the matrix form of the system on the left by the 
inverse of the coefficient matrix, we obtain 


CE DEI DO): 


that is, 


(96) 


or 


So the system has the unique solution z = 1, y = 2. 


64 


In general, suppose that Ax = b is the matrix form of a system of linear 
equations, and that the coefficient matrix A is invertible. Then we can 
multiply both sides of the equation Ax = b on the left by AT! to yield 
A-'Ax = A~'b; that is, x = A~'b. It seems, then, that the system has 
the unique solution x = A~'b. 


However, we have to be careful before making this claim. Whenever we 
manipulate an equation in order to solve it, we have to be sure that the 
manipulation yields a second equation equivalent to the first (otherwise the 
two equations might have different solution sets). 


In this case, we have to be sure that 
Ax=b if and only if x=A'b. 


We showed above that if Ax = b, then multiplying both sides on the left 
by At yields x = A~'b; in other words, we proved that Ax = b implies 
x = A~'b. It remains to prove that x = A~'b implies Ax = b, and 
fortunately this is just as easy: if x = A~'b, then multiplying both sides 
of this equation on the left by A yields Ax = AA~'!b; that is, Ax = b. 


So multiplying both sides of Ax = b on the left by AT! does yield an 
equivalent equation. We have proved the following theorem. 


Theorem C8 


Let A be an invertible matrix. Then the system of linear equations 
Ax = b has the unique solution x = A~'b. 


Exercise C31 


Use Theorem C8 to solve the following system of linear equations. 


x y 22 = 1 
—£ — 4z7=2 
3x2 + 2y + 10z = -1 
In Worked Exercise C16 you saw that 
-1 


11 2 4 -3 -2 
-1 0 —4| =|-1 2 1 
10 -1 4 3 


In general, it is worth using the method of Theorem C8 only if we have 
already calculated the inverse of the coefficient matrix. To use the method 
of Theorem C8 to solve Ax = b, where A is an n x n invertible matrix, we 
first invert A. This involves row-reducing the matrix (A | I). We then 
calculate the matrix product A7~'b. On the other hand, the method of 
Section 2 using Gauss-Jordan elimination involves only row-reducing the 
matrix (A | b) and so is usually quicker. 


4 Matrix inverses 


65 


Unit C1 Linear equations and matrices 


66 


Theorem C8 shows, in particular, that if the coefficient matrix A of a 
system of linear equations Ax = b is invertible, then the system has a 
unique solution. The converse of this result is also true — we prove this in 
the next theorem. 

This theorem gives some important relationships between the invertibility 
of a matrix and the number of solutions of a system of linear equations that 
has this matrix as its coefficient matrix. The theorem states that three 
conditions are equivalent: any one of the conditions implies any other one. 


Theorem C9 


Let A be an n x n matrix. Then the following statements are 
equivalent. 


(a) A is invertible. 


(b) The system Ax = b has a unique solution for each n x 1 
matrix b. 


(c) The system Ax = 0 has only the trivial solution. 


Proof We show that (a) => (b), (b) => (c) and (c) => (a), which 
shows that the conditions are equivalent. 

(a) => (b) 

Suppose that A is an invertible n x n matrix. Then, by Theorem C8, for 
any n x 1 matrix b, the system Ax = b has the unique solution x = A7'b. 
(b) = (c) 

Suppose that the system Ax = b has a unique solution for each n x 1 
matrix b. Then, in particular, the homogeneous system Ax = 0 has a 
unique solution. But every homogeneous system has the trivial solution; 
thus this unique solution must be the trivial one. 

(c) = (a) 

Suppose that the system Ax = 0 has only the trivial solution. Then 
row-reducing the augmented matrix 


ai @2 `: amn |O 
a21 a2 + an | 0 
Aml Am2 ''' Amn | 0 


of the system must yield 


1 Q- O10 
0 1 =s O|0 
0 0 sa 1/0 


since this is the row-reduced matrix that corresponds to each unknown 
being 0. If we now ignore the last column of each of the matrices 
appearing in this row-reduction, we are left with a reduction of A to I. 
Hence, by the Invertibility Theorem (Theorem C7), A is invertible. E 


4.4 Elementary matrices 


In this subsection you will meet a class of square matrices associated with 
elementary row operations and investigate their properties. 


We will use these matrices and their properties in Subsection 4.5 to help 
prove the Invertibility Theorem (Theorem C7). We will also find them 
useful later. 


Consider the following matrices: 


0 1 0 1 0 0 1 0 0 

1 0 O], 0 5 O], 0 1 0 

00 1 0 0 1 0 2 1 

They are obtained by performing, on the 3 x 3 identity matrix, the 
elementary row operations rı © r2, ro > dre and r3 > r3 + 2re, 


respectively. 


Definition 
A matrix obtained by performing an elementary row operation on an 
identity matrix is an elementary matrix. 


The elementary row operation that is performed to obtain an elementary 
matrix from an identity matrix is called the elementary row operation 
associated with that elementary matrix. 


We now demonstrate the most important property of elementary matrices. 
Below, we show the effect of multiplying the matrix 


1 2 3 4 
A=|5 6 7 8 
9 10 11 12 


on the left by each of the above elementary matrices. Notice that in each 
case, the resulting matrix is precisely the matrix that is obtained when the 
row operation associated with the elementary matrix is performed on A. 


4 Matrix inverses 


67 


Unit C1 Linear equations and matrices 


68 


0 1 0 1 2 3 4 5 6 7 8 
1 0 0 5 6 7 8]= 1 2 3 4 
0 0 1 9 10 11 12 9 10 11 12 
elementary matrix A matrix obtained when 
associated with ror 
rı Ore is performed on A 
1 0 0 1 2 3 4 1 2 3 4 
0 5 0 5 6 7 8]= | 25 30 35 40 
00 1 9 10 11 12 9 10 11 12 
elementary matrix A matrix obtained when 
associated with rə > 5r2 
ro — 5re is performed on A 
1 0 0 1 2 3. 4 1 2 3 4 
010 5 6 7 8)]= 5 6 7 8 
02 1 9 10 11 12 19 22 25 28 
elementary matrix A matrix obtained when 
associated with r3 > r3 + 2r2 
r3 > r3 + 2re is performed on A 


There is nothing special about the above elementary matrices, or about 
the above matrix A. In the next exercise you will find that other 
elementary matrices behave similarly. 


Exercise C32 


1 2 3 


1 
3 
3 2 1 5 


Q AN 


Let A= ( ) and B= 


7 8 


(a) Write down the 2 x 2 elementary matrix associated with the 
elementary row operation rı > 5r1. 


Multiply A on the left by this elementary matrix, and check that the 
resulting matrix is the same as the matrix obtained when rı —> 5r, is 
performed on A. 


(b) Write down the 4 x 4 elementary matrix associated with the 
elementary row operation rg + ro + 3r4. 


Multiply B on the left by this elementary matrix, and check that the 
resulting matrix is the same as the matrix obtained when 
ro > r2 + 3r4 is performed on B. 


Notice that the number of columns of the elementary matrix used must 
equal the number of rows of the matrix upon which the elementary 
operation is to be performed; that is, the elementary row operations should 
be applied to an appropriately sized identity matrix to obtain the 
elementary matrix required. 


In general, we have the following theorem, which we state without proof. 


Theorem C10 


Let E be an elementary matrix, and let A be any matrix with the 
same number of rows as E. Then the product EA is the same as the 
matrix obtained when the elementary row operation associated with 
E is performed on A. 


Theorem C10 tells us that if we perform an elementary row operation on a 
matrix A with m rows, then the resulting matrix is EA, where E is the 
m x m elementary matrix associated with the row operation. 


What happens if we perform a sequence of k elementary row operations on 
a matrix A with m rows? Let E4, E2,..., Ep be the m x m elementary 
matrices associated with the row operations in the sequence, in the same 
order. The first row operation is performed on A, producing the matrix 
EA; the second row operation is then performed on this matrix, 
producing the matrix E2(E1A) = E2E1 A; and so on. After the whole 
sequence of k row operations has been performed, the resulting matrix is 
E,Ep_1---E gE, A. Notice that the order of the elementary matrices in 
this matrix product is the reverse of the order in which their associated 
row operations are performed. 


This fact will be useful later, and we record it as a corollary to 
Theorem C10. 


Corollary C11 


Let E1, Eo,..., Ep be the m x m elementary matrices associated with 
a sequence of k elementary row operations carried out on a matrix A 
with m rows, in the same order. Then, after the sequence of row 
operations has been performed, the resulting matrix is 


PERCALE 


For example, earlier, to illustrate the Invertibility Theorem (Theorem C7), 
we performed the sequence of row operations 


Yo > Yq— 2r r> iro, rı > rı — 3ro, 
on the matrix 
1 3 
A= 
to produce the identity matrix Ib. 


By Corollary C11 we have the following, which you should check by 
evaluating the product on the right-hand side. 


(0 i)= (0 a) G2 DG 9) 


4 Matrix inverses 


69 


Unit C1 Linear equations and matrices 


70 


We now explore some other useful connections between elementary row 
operations and elementary matrices. We begin by introducing a further 
property of elementary row operations. 


In the following example, the second elementary row operation undoes the 
effect of the first. 


rı L 2 3 
TQ 4 5 6 

1 2 3 
ro > r2 + 3r1 7 11 15 


L 2 8 
to > r2 — 3r] 4 5 6 


In fact, given any elementary row operation, it is easy to write down an 
inverse elementary row operation that undoes the effect of the first, as 
summarised in the following table. 


Elementary row operation Inverse elementary row operation 
ri © rj ri © rj 
rjocr, (c#0) r; > (l/c) ri 
ri > ri + cr; ri > ri — cr; 


Exercise C33 


Write down the inverse of each of the following elementary row operations. 
Check your answer in each case by carrying out the sequence of two row 
operations on the matrix 


12 3 
A=(j 5 ae 


(a) rı > rı — 2r2 (b) rı Ore (c) r2 + —3r2 


Note that if two elementary row operations are such that the second is the 
inverse of the first, then the first is the inverse of the second — so it makes 
sense to say that they are inverses of each other, or that they form an 
inverse pair. For example, the inverse of rg > rə + 3r, is rə > re — 3r, 
and the inverse of rə > rə — 3r, is rə > rə — (—3)rj, that is, ro > rə + 3r1. 
So rə > ro + 3rı and rg — rə — 3r, are inverses of each other. 


Now consider the following pair of 2 x 2 elementary matrices associated 
with the inverse pair of elementary row operations rg > ra + 3r; and 
r2 > Yo — 3r]: 


(3 1)» (e 1) 


These two matrices are themselves inverses of each other, as we can easily 
check: 


(Gs = = DG») 


This connection between inverse pairs of elementary row operations and 
inverse pairs of elementary matrices holds in general. 


Theorem C12 


Let E and Es be elementary matrices of the same size whose 
associated elementary row operations are inverses of each other. Then 
E; and Ep» are inverses of each other. 


Proof In this proof we refer to the row operations associated with E1 
and Ez as row operation 1 and row operation 2, respectively. 


By Corollary C11, E2E I is the matrix produced when row operations 1 
and 2 are performed, in that order, on I. Similarly, Ey FoI is the matrix 
produced when row operations 2 and 1 are performed, in that order, on I. 
But each of these two row operations undoes the effect of the other, so 
EEI = I and E Eol = I; that is, 


EE; = I = E E9. 


Thus E; and Ep are inverses of each other. E 


Theorem C12 has the following corollary. 


Corollary C13 


Every elementary matrix is invertible, and its inverse is also an 
elementary matrix. 


Proof Let E be an elementary matrix. Then E has an associated 
elementary row operation. This associated elementary row operation has 
an inverse operation, and the elementary matrix of the same size as E 
associated with this inverse operation is the inverse of E, by 

Theorem C12. E 


Exercise C34 


Use the method suggested by the proof of Corollary C13 to find the inverse 
of the elementary matrix 


4 Matrix inverses 


71 


Unit C1 Linear equations and matrices 


72 


4.5 Proof of the Invertibility Theorem 


We are now ready to prove the Invertibility Theorem, using elementary 
matrices and their properties. We first remind you of the theorem. 


Theorem C7 Invertibility Theorem 


(a) A square matrix is invertible if and only if its row-reduced form 
is I. 

(b) Any sequence of elementary row operations that transforms a 
matrix A to I also transforms I to A~!. 


Proof Let A be an n x n matrix, and let the row-reduced form of A 

be U. Let Ey, Eo,..., Ep be the n x n elementary matrices associated with 
a sequence of k elementary row operations that transforms A to U. Then, 
by Corollary C11, 


U=BA, 


where B = E, Ex_ 1 --: Eg E;. Now B is invertible — since every elementary 
matrix is invertible (by Corollary C13), and a product of invertible 
matrices is invertible (by Theorem C5). 


(a) ®. We start by proving the only if statement. © 
First we show that if A is invertible, then U = I. 


Suppose that A is invertible. Then U is a product of invertible 
matrices (B and A); hence U is invertible. 


Therefore U does not have a zero row (since, by Theorem C4, a 
square matrix with a zero row is not invertible), and so from the 
definition of row-reduced form, it has a leading 1 in each of its n rows. 
Each of these n leading ones lies in a different column; so, since U has 
only n columns, each column must contain a leading 1. Thus the 
leading 1 in the top row must lie in the left-most position, and the 
leading 1 in each subsequent row must lie just one position to the 
right of the leading 1 in the row immediately above. All the entries 
above and below these leading ones are zeros. Hence U = I. 


®. We now prove the if statement. .©& 
Next, we show that if U =I, then A is invertible. 


Suppose that U =I. Then 
I= BA. (5) 


Multiplying both sides of equation (5) on the left by B~! yields 
B'I=B''BA, 

that is, 
Bo =A. 

Multiplying both sides of this equation on the right by B yields 


B-'B = AB, 
that is, 
I= AB. (6) 


Equations (5) and (6) together tell us that A is invertible, and that 
A =B. 


(b) It follows from the proof of part (a) that if U =I, then A is invertible 
and A~! = B; that is, A7! = E,E,_1--- EE}. 
This equation can be written as 
A`! = ExEx-1--- E2E;I, 


which tells us that A~! is the matrix produced by performing on I 
the sequence of row operations associated with E4, Eo,..., Ex. | 


5 Determinants 


In this section you will revise the determinant of a 2 x 2 matrix, and see 
how this concept extends to n x n matrices. 


5.1 Systems of linear equations and 
determinants 


Determinants arise naturally in the study of systems of linear equations. 


In 1693 Gottfried Wilhelm Leibniz (1646-1716) wrote a letter to the 
Marquis de l’Hopital in which he demonstrated a method for solving a 
system of three simultaneous equations which involved calculating 
what we now call the determinant of a 3 x 3 matrix, and went on to 
give a general (although rather unclear) rule for calculating the 
determinant of an n x n matrix. 


The actual term ‘determinant’ was introduced by Carl Friedrich 
Gauss (1777-1855) in his Disquistiones Arithmeticae of 1801, but it 
was Augustin-Louis Cauchy (1789-1857) who in 1812, adapting the 
term from Gauss, first used it in its modern sense and began to 
develop a proper theory of determinants. 


5 Determinants 


73 


Unit C1 Linear equations and matrices 


Gabriel Cramer 


= 


oo ey 


Colin Maclaurin 


74 


This connection between determinants and systems of linear equations was 
made explicit by Gabriel Cramer in a method known as Cramer’s rule. If a 
unique solution exists for a system of n linear equations in n unknowns, 
then this solution can be found by evaluating determinants. You will see 
Cramer’s rule for a system of two linear equations in two unknowns; it is 
rather unwieldy to use for larger systems. However, Cramer’s rule gives an 
expression for each unknown individually, so it makes it possible to find 
one unknown without solving the whole system. 


Cramer’s rule is named after the Swiss mathematician Gabriel Cramer 
(1704-1752) who presented it in his Introduction à l'analyse des lignes 
courbes algébriques (Introduction to the Analysis of Algebraic Curved 
Lines) of 1750, although the Scottish mathematician Colin Maclaurin 
(1698-1746) had already described the rule in his Treatise of Algebra 
(1748) written in 1730 but not published until after his death. 


Determinant of a 2x2 matrix 


We start by looking at a system of two equations in two unknowns, where 
the coefficients of the system are real numbers. 


aix + biy = cy a, b,\ (x c 
or = : 
agx + boy = cg az bo) \y C2 
Using Gauss-Jordan elimination, the following solution can be found, 
cb — bico a1C2 — C142 


s= 2 ae, y ae an (7) 


aiba — biag aiba — biaz’ 


provided that a ,bz — b1a2 is not zero. (You can check this solution by 
substitution.) We call the expression a1b2 — biaz the determinant of the 
coefficient matrix. Each term in this expression contains the letters a and 
b, and the subscripts 1 and 2, in some order. 


The definition we give for the determinant of a 2 x 2 matrix is in a form 
that is easier to remember. 


Definition 


The determinant of a 2 x 2 matrix 
@ D 
a= (ti) 


a b 
det A= A 


is 


|= ad te 


5 Determinants 


You might find it helpful to remember the scheme shown in Figure 15. a b 
We write det A, and use vertical bars ‘| ... |’ around the matrix entries, in 
place of the round brackets, to denote the determinant. Some texts use the d 
notation |A| rather than det A. e 
1 2 Figure 15 Scheme for 2 x 2 
For example, let A = f A then determinant 
1 2 
det A = 3 4 = (1x 4) — (2x 3) = -2. 


Exercise C35 


Evaluate the determinant of each of the following matrices. 
5 1 10 —4 7 3 
i ({ À (e e 2) (©) (i J 


Notice that the numerators of the solutions for x and y in (7) can also be 
written as determinants: 


cy b a, ĉl 
c1b2 — bicg = and a@1c2 — C102 = . 
C bə ag C2 
So we could write these solutions as 
cy b ay Cy 
c2 b2 a2 C2 
p= , y= Ám. 
ay by ay bi 
a2 b2 az bg 


This is Cramer’s rule for a system of two linear equations in two 
unknowns. The numerator of the expression for x is the determinant of the 
coefficient matrix of the system with the first column replaced by the 
constant terms. Similarly, the numerator of the expression for y is the 
determinant of the coefficient matrix of the system with the second column 
replaced by the constant terms. 


In Subsection 5.4 we will prove that a 2 x 2 matrix is invertible if and only 
if its determinant is non-zero. For an invertible 2 x 2 matrix, there is a 
quick way to find the inverse using the determinant. You can verify the 
following strategy by checking that the expression given below for AW! 
does indeed satisfy AAT! =I = ATHA. 


75 


Unit C1 Linear equations and matrices 


> 
x 


Figure 16 A parallelogram 
with area 5 


76 


Strategy C4 
To find the inverse of a 2 x 2 matrix 
a b 
Sled 
with det A = ad — bc £0: 
e interchange the diagonal entries 


e multiply the non-diagonal entries by —1 


e divide by the determinant of A, giving 


1 do 
BS ! 
ad — be & n 


Exercise C36 


Determine whether or not each of the following matrices is invertible, and 
find the inverse if it exists. 


oka olid ©( 4) 


There is also a geometric interpretation of the determinant: let (a,c) and 
(b, d) be two position vectors. Then the determinant 


a b 
c d 


gives the area of the parallelogram with adjacent sides given by these 
position vectors. For example, the parallelogram shown in Figure 16 with 
vertices (0,0), (2,1), (1,3) and (3,4) has area 5, since the base and height 
are both equal to v5. Now, since one of the vertices is at the origin, the 
position vectors (2,1) and (1,3) determine the parallelogram, and 

2 1 
i 3 
Determinant of a 3x3 matrix 


We now consider the following system of three linear equations in three 


= (2x 3)—(1* 1) =5. 


unknowns: 
ax + biy + c12 = dı a b a x dı 
agx + boy + coz = d2 or a2 b2 C2 y| = | do 
agx + b3y + c3z = d3 a3 b3 cg zZ d3 


Again we can find the solution, if one exists, using Gauss-Jordan 
elimination. It turns out that the solutions for x, y and z all have the same 
denominator: 


a1b2c3 — a1c2b3 — b1a2c3 + bic2a3 + c1a2b3 — c1b203. 


This is the determinant of the 3 x 3 coefficient matrix. Notice that each 
term in this expression for the denominator contains the letters a, b and c, 
and the subscripts 1, 2 and 3, in some order. 


The definition we give for the determinant of a 3 x 3 matrix is expressed in 
terms of three 2 x 2 determinants. This is the easiest way to remember the 
definition. 


Definition 


The determinant of a 3 x 3 matrix 


ay by Cil 
A= a2 bz C2 
a3 b3 C3 
is 
bg ¢ a2 ¢ ag b 
det A = aj 2 aes: 2 Al io A E 
0 6 a3 C3 az b3 


Notice the minus sign before the second term on the right-hand side. 


Worked Exercise C17 


Evaluate the determinant of each of the following 3 x 3 matrices. 


12 1 4 01 
(a) | 3 1 -1 (b) [0 -1 2 
-2 1 1 2 13 


5 Determinants 


T7 


Unit C1 Linear equations and matrices 


78 


Exercise C37 


Evaluate the determinant of each of the following 3 x 3 matrices. 


3 2 1 2 10 0 
(a) [4 0 -1 (by [3 -1 2 
0—1 1 5 92 


Determinants of larger matrices (4 x 4, and so on) are defined similarly in 
terms of smaller determinants in the next subsection. Note that 
determinants are defined only for square matrices. As with 2 x 2 matrices, 
determinants of larger matrices can be used to solve systems of linear 
equations. 


5.2 Evaluating determinants 


You have seen that although the determinant of a 2 x 2 matrix is simple to 
evaluate, the determinant of a 3 x 3 matrix is quite complicated. 
Determinants of larger matrices become increasingly more complicated as 
the size of the matrix increases. You will mainly be finding determinants of 
matrices of size 2 x 2 and 3 x 3. In this subsection we develop a strategy 
for evaluating determinants by expressing them eventually in terms of 
determinants of 2 x 2 matrices, as with the definition of the determinant of 
a 3 x 3 matrix above. 


Cofactors 


A submatrix is a matrix formed from another matrix with some of the 
rows and/or columns removed; submatrices are useful when evaluating 
determinants. 


We can express the determinant of a 3 x 3 matrix A = (a;;) conveniently as 
det A = a11411 + a12412 + 413.413. 


The elements A11, A12 and Aj3 in this expression are called the cofactors 
of the elements a11, a12 and a13, respectively. We can see from the 
definition of the determinant that these cofactors are themselves 
determinants with a + or — sign attached. In fact, Ai; is (—1)!+Ħ} times 
the determinant of a submatrix of A formed by removing the top row and 
one column of A — namely the row and column containing the element arj. 


Thus for Aj; we have 


a a 
so A= 22 23 
a32 433 
a21 a2 
SO Ajo = — . 
a31 433 
a a 
so A3 = 21 22| 
a31 432 


In fact, there is a cofactor associated with each entry of any square matrix. 


Definition 
Let A = (aij) be an n x n matrix. The cofactor Aj; associated with 
the entry aj; is 
Ai = (D det Ayj, 
where Aj; is the (n — 1) x (n — 1) submatrix of A resulting when the 


ith row and jth column (the row and column containing the entry aij) 
are removed. 


For example, for the cofactor Ag3 of the 4 x 4 matrix A = (aij) we have 


Q11 412 414 
so Ao3 = — |a31 32 434]. 
a41 Q42 G44 


Exercise C38 


Write down expressions for the cofactors A13 and A45 of the matrix 


123 4 5 
23 4 5 1 
A=]3 4 5 1 2 
45 12 3 
5 12 3 4 


(Do not attempt to evaluate these expressions!) 


5 Determinants 


79 


Unit C1 Linear equations and matrices 


80 


Determinant of an nxn matrix 


You have seen that we can use cofactors to evaluate the determinant of a 
3 x 3 matrix. Determinants of larger matrices can be evaluated in a similar 
way. 


Definition 


The determinant of an n x n matrix A = (aij) is 


wa G12 ~ Gin 

Gi, Cpa ee Opp 
det ASIA ; 

Ani an2 °*** Ann 


= anA sb aaia P oe se Onin 


Do not forget the minus sign that is a part of alternate cofactors! 


The determinant of a matrix is a complicated string of terms. The 
definition above collects the terms into manageable expressions using the 
cofactors of the entries of the top row; when we write the determinant in 
this way, we say that we are expanding along the top row. 


There are alternative expansions for the determinant of a square matrix 
that collect the terms in different ways — however, the resulting value for 
the determinant is always the same. 


We are now in a position to evaluate the determinant of any square matrix 
using the following strategy. 


Strategy C5 
To evaluate the determinant of an n x n matrix: 


1. expand along the top row to express the n x n determinant in 
terms of n determinants of size (n — 1) x (n — 1) 


2. expand along the top row of each of the resulting determinants 


3. repeatedly apply step 2 until the only determinants in the 
expression are of size 2 x 2 


4. evaluate the final expression. 


Figure 17 gives a scheme for an n x n determinant. 


[det A) 


| 


TA 
FO Ga, FOO, 


Figure 17 Scheme for an n x n determinant 


Worked Exercise C18 illustrates Strategy C5, before you are asked to find 
the determinant of a 4 x 4 matrix in Exercise C39. 


Worked Exercise C18 


Evaluate the following determinant. 
20 35 


0 4 -1 0 
10 0 1 
02 1 1 


Exercise C39 


Evaluate the following determinant. 


02 1 =l 
=3 0 0 =i 
101 0 
0 4 2 0 


5 Determinants 


81 


Unit C1 Linear equations and matrices 


82 


5.3 Properties of determinants 


Suppose that A and B are two n x n matrices. Are there any relationships 
between det A, det B, det(A + B) and det(AB)? 


Exercise C40 


—3 1 1 1 
Let A = ( 9 e adB=(_; s 
Evaluate det A, det B, det(A + B), det(AB) and (det A)(det B). 


Comment on your results. 


You should have found in the solution to Exercise C40 that there does not 
appear to be a simple relationship for the addition of determinants; that is, 
we cannot easily express det(A + B) in terms of det A and det B. 
However, the identity 


det(AB) = (det A) (det B) 


does hold for all square matrices A and B of the same size. The simplicity 
of this result is somewhat surprising, given the complexity of the 
definitions of matrix multiplication and the determinant. 


We collect together, without proof, some results about determinants in the 
following theorem. 


Theorem C14 


Let A and B be two square matrices of the same size. Then the 
following hold: 


(a) det(AB) = (det A)(det B) 
(b) detI=1 
(c) det AT = det A. 


Elementary operations and determinants 


Earlier, in Theorem C10, you saw that multiplication on the left by an 
elementary matrix has the same effect as applying the associated 
elementary row operation. Here, we use elementary matrices to prove some 
useful results about determinants. 


Exercise C41 


Evaluate the following determinants, where k is any real number. 
1 0 0 0 


(a) ©) |p 
0 


or © 
re Oo © 


1 
a 
0 


100 () 
0 k 0 lh 1 
001 


The results of Exercise C41 are particular cases of the following theorem. 
The proof is not hard, but it is not very illuminating, so is not given here. 


Theorem C15 

Let E be an elementary matrix, and let k be a non-zero real number. 
(a) If E results from interchanging two rows of I, then det E = —1. 
(b) If E results from multiplying a row of I by k, then det E = k. 


(c) If E results from adding k times one row of I to another row, 
then det E = 1. 


Zeros in a matrix greatly simplify the calculation of the determinant. If an 
entire row of the matrix is zero, then all the terms vanish and the 
determinant is zero. Some other matrices with zero determinant are also 
easy to recognise. 


Theorem C16 

Let A be a square matrix. Then det A = 0 if any of the following hold: 
(a) A has an entire row (or column) of zeros 

(b) A has two equal rows (or columns) 


(c) A has two proportional rows (or columns). 


Proof We prove the statements for rows. The results for columns follow, 
as Theorem C14(c) states that taking the transpose does not alter the 
determinant of a matrix. 


(a) ®. We follow Strategy C5 and expand along the top row of A, and 
continue by expanding along the top row of the resulting 
determinants until the only determinants in the expression are of size 
2 x 2. The first term of the full expansion is therefore the product 
41022033 °**Gnn, and each other term similarly comprises a product 
containing one entry from each row and each column. .® 


5 Determinants 


83 


Unit C1 Linear equations and matrices 


84 


Each term in the full expansion of the determinant of A is a product 
containing one entry from each row and each column of A. If an 
entire row of A is zero, then each term of this expansion contains at 
least one zero, and so each term is zero. Hence the determinant of A 
is equal to zero. 

(b) Ifthe ith and jth rows of the matrix A are equal, then A remains the 
same if these rows are interchanged. Let E be the elementary matrix 
obtained by interchanging the ith and jth rows of I. Then EA = A. 
Using Theorems C14 and C15, we have 


det A = det(EA) = (det E)(det A) = —1 x det A. 
This implies that det A = 0, as required. 


(c) ®. Two rows (or columns) of a matrix are proportional when one is a 
multiple of the other. © 


Suppose that the ith row of A is equal to k times the jth row. Let E 
be the elementary matrix obtained from I by multiplying the ith row 
by 1/k. Then the ith and jth rows of the matrix EA are equal. The 
determinant of this matrix EA is zero, by (b) above. Using 
Theorem C14, we have 

(det E)(det A) = det(EA) = 0. 


Now det E = 1/k, by Theorem C15. This implies that det A = 0, as 
required. a 


Exercise C42 


Evaluate the determinant of the matrix 


1-2 4 
A= 0 13 11 
—2 4 -8 


Theorem C15(a) and Theorem C14(a) together mean that if B is a matrix 
obtained from a matrix A by interchanging a pair of rows, then 

det B = — det A. Therefore the evaluation of the determinant can be 
significantly simplified if a row of the matrix contains some zeros, as the 
following worked exercise illustrates. 


Worked Exercise C19 


Evaluate the determinant of the following matrix. 


Exercise C43 


Evaluate the determinant of the following matrix. 


10 3 —4 2 

02 0 1 

aie 0 6 00 
-1 2 10 


5.4 Determinants and inverses of matrices 


Earlier, in Subsection 5.1, we stated that the inverse of a 2 x 2 matrix A 
exists if and only if det A Æ 0. This extends to all square matrices. 


Theorem C17 
A square matrix A is invertible if and only if det A 4 0. 


Proof Let A be ann x n matrix. 
®. We start by proving the only if statement. .@ 
First we show that if A is invertible, then det A 4 0. 


Suppose that A is invertible. Then since AA~! = Iņ, it follows from 
Theorem C14 that 


(det A)(det A7!) = det(AA~!) = det I, = 1. 
Therefore neither det A nor det AT! is 0. 
®. We now prove the if statement. & 


5 Determinants 


85 


Unit C1 Linear equations and matrices 


86 


Next we show that if det A Æ 0, then A is invertible. 


Now suppose that det A 4 0. Let E,,..., Ep be elementary matrices such 
that Ep- -- E2E1A = U is matrix A in row-reduced form. Using 
Theorems C14 and C15 and the assumption that det A Æ 0, we have 


det U = (det Ez) -- - (det E2) (det E,)(det A) # 0. 


Now this implies that U has no zero row, and therefore has a leading 1 in 
each of its n rows. Hence U = I,, and so, by the Invertibility Theorem 
(Theorem C7), the matrix A is invertible, with 


A`! = Ep- EE). | 


We saw in the proof of Theorem C17 above that if A is invertible, then 
(det A)(det A~!) = 1. This gives the following useful result. 


1 
det A7! = —— 
Á det A 


Until now, if we wanted to show that an n x n matrix A is invertible, we 
had to produce an n x n matrix B such that 


AB =I = BA. 


The next theorem shows that if one of these conditions holds, then the 
other holds automatically. Thus if we want to show that an n x n matrix 
A is invertible, it is enough to produce an n x n matrix B satisfying either 
condition. 


Theorem C18 


Let A and B be square matrices of the same size. Then AB = I if 
and only if BA =I. 


Proof ®. We start by proving the only if statement. @ 
First we show that if AB =I, then BA =I. 
Suppose that AB = I. Then, by Theorem C14, 
(det A)(det B) = det(AB) = det I = 1. 
This implies that 
de A #0 and detB #0, 
so, by Theorem C17, A and B are both invertible. 
Now, 
A7! = ATTI, 
and we can write I as AB, so 
A = A`! (AB) = (A A)B = IB = B, 
and therefore 
BA = A™A =], 


as required. 


5 Determinants 


®. To prove the if statement we have to show that if BA =I, then 
AB =I. We can use exactly the same argument as above with A and B 
exchanged. .@ 


The same argument, with the roles of A and B interchanged, proves the 
converse. o 


We summarise the results on the invertibility of a matrix A as follows. 
This one theorem collects together Theorems C7, C9 and C17. 


Theorem C19 Summary Theorem 


Let A be an n x n matrix. Then the following statements are 
equivalent. 


(a) A is invertible. 

(b) det A 40. 

(c) The row-reduced form of A is Iņ. 
( 


d) The system Ax = b has precisely one solution for each n x 1 
matrix b. 


(e) The system Ax = 0 has only the trivial solution. 


To conclude this section, we collect together some of the most important 
properties of matrices from this unit. 


Summary of properties of matrices 


Let A and B be two square matrices of the same size. Then 
det(AB) = (det A)(det B), 
(AB)? = BA’, 
det A? = det A. 
If A and B are invertible, then 
(AB) =B A] 
1 


dt A === 
5 det A 


87 


Unit C1 Linear equations and matrices 


88 


Summary 


In this unit you have seen that systems of linear equations can have no 
solution, a unique solution or infinitely many solutions, and you have used 
Gauss-Jordan elimination to solve such systems. You have seen that 
matrices can be used in two different ways to represent systems of linear 
equations: both as an augmented matrix and as a matrix equation in 
which the coefficient matrix is multiplied on the right by the matrix of 
unknowns to give the matrix of constant terms. You studied how 
properties of matrices relate to properties of the corresponding systems of 
linear equations. In particular, you saw that if the coefficient matrix of a 
system of linear equations is invertible, or equivalently, if the determinant 
of the coefficient matrix is non-zero, then the system has a unique solution. 
You also saw that the set of m x n matrices with real entries forms an 
abelian group under addition and that the set of n x n invertible matrices 
with real entries forms an abelian group under matrix multiplication. 


You will encounter systems of linear equations throughout the linear 
algebra units, along with matrices and their properties. Matrices will also 
appear in the group theory units, in particular you will work with the 
group of invertible 2 x 2 matrices in Book E. 


Learning outcomes 


Learning outcomes 


After working through this unit, you should be able to: 


e understand the connection between the solutions of systems of linear 
equations in two and three unknowns, and the intersection of lines and 
planes in R? and R? 


e explain the terms solution set, consistent, inconsistent and homogeneous 
system of linear equations 


e use the method of Gauss—Jordan elimination to find the solutions of 
systems of linear equations 


e describe the three types of elementary operation and elementary row 
operation 


e recognise whether or not a given matrix is in row-reduced form and 
row-reduce a matrix 


e write down the augmented matrix of a system of linear equations, 
recover a system of linear equations from its augmented matrix, and 
solve a system of linear equations by row-reducing its augmented matrix 


e perform the matrix operations of addition, multiplication and 
transposition 


e recognise the following types of matrix: square, zero, diagonal, lower 
triangular, upper triangular, identity, symmetric 


e express a system of linear equations in matrix form and state the 
relationship between the invertibility of the coefficient matrix and the 
number of solutions of the system 


e understand what is meant by an invertible matrix and determine 
whether or not a given matrix is invertible and, if it is, find its inverse 


e understand that the set of n x n invertible matrices with real entries 
forms a group under matrix multiplication 


e understand the connections between elementary row operations and 
elementary matrices 


e understand the term determinant of a square matrix, evaluate the 
determinant of a 2 x 2 matrix and expand along the top row to calculate 
the determinant of larger matrices 


e use determinants to check whether or not a matrix is invertible. 


89 


Unit C1 Linear equations and matrices 


90 


Solutions to exercises 


Solution to Exercise C1 
(a) This is a linear equation. 


(b) This is not a linear equation. The third term 
involves the product of 73 and z4. 


(c) This is a linear equation (although not all of 
the five unknowns appear in this equation). 


(d) This is not a linear equation. For example, the 
second term, a273, involves a product of unknowns. 


Solution to Exercise C2 


A general homogeneous system of m linear 
equations in n unknowns is 


aiti +++: + Gintn =0 
a21%1 +++: + aantn = 0 


amitti +++: + Amntn = 0. 


We substitute the values 71 = 0, z2 =0,..., 
Ln = 0 into the equations of the system: 


a0 +--+ + int = 0 
a210 +: + am0 = 0 


am10 +--+: + Gmn0 = 0. 


All the equations are satisfied, whatever the values 
of the coefficients a;;. The solution set therefore 
contains the trivial solution 


x, =0, rq = 0, teeny ta =i. 


Solution to Exercise C3 
We label the equations and apply elementary 
operations to simplify the system. 


r+y=4 
2x =y= D 


rı 
r2 


First we eliminate the unknown x from the second 
equation. 


r+ y=4 


ro > ro — 2r1 — 3y = —3 


We then simplify this equation. 


r+y=4 
r2 > —$r2 y=1 


We use it to eliminate the unknown y from the 
first equation of the system. 

Lr=3 

y=1 


ry —> ri = fs 


We conclude that there is a unique solution: x = 3, 
y=. 

The method above eliminates the unknowns in 
order; you may have begun by performing the 
elementary operation rı > rı + re to eliminate y 
from rı. This is also correct. 


Solution to Exercise C4 


The explanations in between the systems of three 
linear equations are not a necessary part of the 
solution: they are included for clarity. 


We label the equations and apply elementary 
operations to simplify the system. 


rı r+ y- z=8 
r2 2r— y+ z=1 
r3 =t + 3y +22 = =8 


First use the rı equation to eliminate the unknown 
x from the other equations. 


r+ y- z=8 
to > r2 — 2rı — 3y + 3z = —15 
r3 > r3 + rı 4y+ z=0 
Now simplify rə. 
r+ y-z=8 
ro > — $r? y=7=5 
4y+2=0 


Then use rə to eliminate the y-terms from rı 
and r3. 


=5 
z=5 
5z = —20 


ri 7T, —Y2 T 
y= 
r3 > r3 — 4r2 


Now simplify r3. 
x =3 
y—-z=5 
r3: —> trs z=-A4 


Then use r3 to eliminate the z-term from ro. 


C3 
r2 > r2 + r3 y=] 
z= —4 


We conclude that there is a unique solution: x = 3, 
y= 1, z = —4. 
Solution to Exercise C5 


We label the equations and apply elementary 
operations to simplify the system. 


rı zr +3y— z=4 
r2 —gz + 2y — 4z = 6 
r3 x + 2y = 2 

r+ 3y- z=4 
ro > Yot+ ri oy — 5z = 10 
r3 —> r3— Tí = z= -2 


r+ 3y—z=4 


ro > iro y—-z=2 
—y +z =-2 

rı > rı — 3r2 x +2z=-2 
y= z2=2 

r3 > r3 + r2 0=0 


(The r3 equation (0 = 0) gives no constraints on z, 
y and z.) 


There are insufficient constraints on the unknowns 
to determine them uniquely, so the system has an 
infinite solution set. 


(As both remaining equations involve a z-term, set 
z equal to the real parameter k.) 


We write the general solution as 


= -2-2k, y=2+k, z=k, KER. 


You may have spotted that rə (y — z = 2) and r3 
(—y + z = —2) were multiples of each other, and 
concluded earlier that there are infinitely many 

solutions; however, the solutions are still needed. 


Solutions to exercises 


Solution to Exercise C6 


We label the equations, and apply elementary 
operations to simplify the system. 


rı Fy z=6 
r2 gEby— 32S -2 
r3 2x +y+3z=6 


to 9 r2 + r1 2y—2z=4 
r3 => r3 = 2rı = —6 

t+ ytz=6 
ro —> iro y= 2 
rı > rı — r2 x +2z=4 

y— z=2 
r3 > r3 +r2 = —4 
The r3 equation is 0 = —4, so we conclude that the 


solution set is empty: the system is inconsistent. 


You may have spotted that the system is 
inconsistent at an earlier stage and therefore 
stopped then. 


Solution to Exercise C7 
Let the equation of the plane be 


ax + by + cz = d, 


where a, b, c and d are real, and a, b and c are not 
all zero. 


Substituting the points into the equation gives a 
system of three linear equations in the unknowns 
a, b and c. We label the equations and apply 
elementary operations to simplify the system. 


rı a +2c=d 
r2 3b+4c=d 
r3 a + b+3c=d 
a +2c=d 
3b + 4c=d 
r3 > r3 — ri b+ c=0 
a +2c=d 
ro > ro — 3r3 c=d 
b+ c=0 


91 


Unit C1 Linear equations and matrices 


92 


a +2c=d 

roe I3 b+ c=0 

c=d 
rı > rı — 2r3 a=-—d 
ro => £9 — T3 b= -d 
c=d 


We conclude that this system has a unique solution 
(in terms of d): a= —d, b= —d, c= d. 


We substitute these expressions into the equation 
for the plane to get 


—dx — dy + dz = d. 
Dividing through by —d yields a simpler equation 
for the plane: 


ety-z=-l1. 


Solution to Exercise C8 


The two unknowns are my sister’s age and my 
brother’s age; let us denote these by s and 6 (in 
years), respectively. 


The first statement of the problem now translates 
to the equation 


s +b = 40, 
and the second statement to 
b=s+12. 


We write these two equations in the usual form 
and label them. 

s+b=40 
-s +b=12 


rı 


r2 


We apply elementary operations to simplify this 
system. 


s+ b=40 
Yo > rə + ri 2b = 52 

s5+b=40 
ro > ro b = 26 
rı > rı — r2 s= 14 

b = 26 


The system has a unique solution: s = 14, b = 26. 


The answer to the problem is that my sister is 
14 years old. 


Solution to Exercise C9 


(a) The augmented matrix of the system is 


4 =2 0|-7 
0 1 3] 0 
0 —3 1 3 


(b) The corresponding system is 
+ 7w=1 

= 
w= 2. 


2x + 3y 
y= Tz 
m + 3z—- 


Solution to Exercise C10 


(a) This matrix is not row-reduced as it does not 
have property 3. 
(b) This matrix is row-reduced. 


(c) This matrix is not row-reduced as it does not 
have property 4. 


Solution to Exercise C11 


(a) The augmented matrix corresponds to the 
system 


Ty 


WIN whe 


T2 


2 


The solution is zı = $, T2 = Å. 


(b) The augmented matrix corresponds to the 
system 


Ti + 6z3 = 0 
z2 + 773 = 0 
O=, 


The third equation cannot be satisfied, so there are 
no solutions. 


(c) The augmented matrix corresponds to the 
system 


£1 + 3x2 + 2x4 = —7, 
z3 — 3x4 = 8, 
5 =M, 
that is, 
v1 = —T7 — 3x9 — 244 
v3 = 8+ 3x4 
v5 = 11. 


Setting x2 = k and z4 = l (k,l € R), we obtain the 


general solution 
zı = —7 — 3k — 2l, 


(d) The augmented matrix corresponds to the 


T2 = k, 
z3 = 8 + 3l, 
T4 = l, 
T5 = 11. 
system 
Zi + xz4=0 
T2 + 424 = 3 
T3 =() 
that is, 
Tı = T4, 


£T? = 3 — 424, 


Setting x4 = k (k € R), we obtain the general 


%3 = 0. 
solution 

T1 = —k, 

T2 = 3— 4k, 

x3 =0, 

LA = k. 


Solution to Exercise C12 


(a) Strategy C1 gives the following sequence of 


row operations. 


rı 1 5 1 4 5-1 15 
ro 1 5 3 12 11 3] 35 
r3 3 15 -1—4 3-6] 10 
r4 —2 —10 1 2-7 6/-10 
15 1 4 5 =1 15 
Ys > Y2 = r] 00 2 8 6 4 20 
r3 > r3 — 3r1 0 0 —4 -16 -12 -3 35 
r4 > r4 + 2r, 00 3 10 3 4 20 
15 1 4 5 =1 15 
ro > $62 i 0 1 4 3 2| 10 
0 0 —4 -16 -12 -3 35 
00 3 10 3 4 20 
ri > Yr, —Yf2 150 0 2 —3 5 
001 4 3 2 10 
r3 — r3 + 4r2 000 0 0 5 5 
ra= ra= 3t9 00 0—2 -6 =2 10 


15 
00 
r3 © r4 00 
00 
15 
00 
r3 > —4r3 00 
00 
15 
ro > ro — 4r3 00 
00 
00 
15 
00 
00 
r4 > 304 0 0 
rı > rı + 3r4 15 
rə 9 rə + 2r4 00 
rs => r3 = r4 00 
00 


Oro ccorc o0oO0reO OoOoO0reO ooO0omO 


© 


Solutions to exercises 


0 2—3 5 
4 3 2 10 
2 =6 =2 10 
0 0 5 5 
0 2 —3 5 
43 2] 10 
L3. oll 5 
00 5 5 
0 2-8 5 
0 —9 —2]} —10 
1 3 1 5 
0 0 5 5 
0 2-8 5 
0 =9 =2 1], —10 
1 3 1 5 
0 0 1 1 
0 20 8 
0-9 0} —8 
1 30 4 
0 01 1 


This is the row-reduced form of the matrix. 


(b) Strategy C1 gives the following sequence of 


row operations. 


Your sequence may differ from this because in the 
first step shown below (which corresponds to 

step 2 of the strategy) row 1 and row 5 are 
interchanged, whereas row 1 could have been 
interchanged with another row. However, your 
final row-reduced matrix should be the same as 
this one, since (as you will see in Theorem C1) the 
row-reduced form of a matrix is unique. 


rı 0 
ro —1 
r3 —1 
r4 2 
r5 1 
Mors I 
—1 

=l 

2 

0 


8 


= 


o0 œ œ © A AA œ œ 


8 =14\ -14 
—4 -6] -11 
—12 8 3 
O 24 34 

0 14 19 

0 14 19 
—4 -6] -ll 
12 8 3 
0 24 34 

8 —14/ -14 


93 


Unit C1 Linear equations and matrices 


94 


1 4 0 14 19 
Yo > ro + 1] 0 4 —4 8 8 
r3 > r3 + ri 0 12 —12 22 22 
r4 > r4 — 2ri 0 0 0 —4 —4 
0 -8 8 -14/ —14 
1 4 0 14 19 
r2 > 4r2 0 1 -1 2 2 
0 12 -12 22 22 
0 0 0 —4 —4 
0 —8 8 —14/ —14 
rı > rı - Aro 1 0 4 6 1i 
0 1 —1 2 2 
r3 > r3 — 12rə 0 0 0 -2 —2 
0 0 0 -4] —4 
r5 > r5 + 8r2 0 0 0 2 2 
1 0 4 6 11 
0 1 -l 2 2 
r3 > —4r3 00 0 1 1 
0 0 0 -4] —4 
0 0 0 2 2 
rı > rı - 6r3 1 0 4 0 D 
Yo > ro = 2r3 0 1 -1 0 0 
0 0 0 1| 1 
r4 > r4 + 4r3 0 0 0 0] 0 
r5 > r5 — 2r3 00 00/0 
This is the row-reduced form of the matrix. 
Solution to Exercise C13 
We have 
1 3 1 2 € 
Yo T 2 =i —1 1 4 5 9 
03 4 9/ 16 


The row operation has created a 1 in the correct 


position in the current row, but it is not a 
leading 1 because it has changed the 0 at the 


beginning of the row to —1. Performing this row 
operation has destroyed the progress made so far 
on the matrix: the first column no longer contains 


a leading 1 with only zeros above and below. 


Solution to Exercise C14 


We follow Strategy C2 and row-reduce the 
augmented matrix. 


rı 1 —4 —4 3 6|2\ 4 
ro 2 —5 —6 6 9/3) 9 
rs 2 4 09 2/0/ 17 
1 —4 -4 3 6| 2\ 4 
ro > ro — 2rı 0 3 20 =8/—1) 1 
r3 > r3 — 2r, 0 12 8 3 —10|—4 9 
1 —4 —4 3 6] 2\ 4 
ro > ar 0 1 20 -1)-$] 4 
0 12 & 3 —10|—4) 9 


(Note that here we cannot find a row operation 
that could be performed instead of rg > iro to 
create a leading 1 while avoiding fractions.) 


rı > rı + 4ro 1 0 -43 2| A 2 
01 4 0 -1/-$] 4 
r3 > r3 — 12rə 00 03 2 0 5 
1 0 -43 2| A 2 
01 20 -1/-$] 4 
r3 > irg 00 01 3) of 3 
ry > rı — 3r3 10-30 O| \ 4 
01 20 -1)/-4] 4 
00 01 3] 0/ 3 


This matrix is in row-reduced form. 
The corresponding system is 


Ti = 423 =% 
T2 + $23 — t5=-3 
£4 + 205 ="); 
that is, 
Ly 2+ $23 
£2 = —4 — frz + 25 
t4 = — 225 


Setting x3 = k and z5 = l (k,l € R), we obtain the 
general solution 


zı = $ + $k, 

a =—4— k+l, 
t3 = K, 

t4=—-3 


Solution to Exercise C15 
Fee) eres, 
mG i+tla w-G ws) 


(c) This sum is undefined since the matrices are 
of different sizes. 


0 6 -2 1 29 
(a) it 8 2|+|1 04 
03 4 3 —4 1 

1 87 

=|2 86 

3 -1 5 


Solution to Exercise C16 


(a) This difference is undefined since the matrices 
are of different sizes. 


5 8 12 3 10 2 

a (; 2 Ta 9 a 
_ (2 -2 10 
(8 -7 -22 


Solution to Exercise C17 


5 -3 20 —12 
(a) 4A=4{ 2 3ļ|=| 8 22 
sf 0 at 0 
2 1 8 4 
(b) 4B=4|-2 -7| =|-8 -28 
3 5 12 20 
a =12 8 4 
(c) 4A+4B=] 8 12] + [-8 -28 
—4 0 12 20 
28 -8 
=| 0 -16 
8 20 


5 —3 2 1 
(d) A+B= 2 37+ |-2 -7 


Solutions to exercises 


thus 
7 —2 28 —8 
4(A+B)=4/0 -4]= 1] 0 —-16 
2 5 8 20 


(Note that 4(A + B) = 4A + 4B.) 


Solution to Exercise C18 


(a) We add corresponding entries of the three 
matrices A = (a;;), B = (bij) and C = (qj). The 
(i, 7)-entry of the matrix A + (B + C) is 

Qij + (bij + Gij); and that of (A + B) +C is 

(aij + bij) + aj. Now, aij, bij and cij are real 
numbers, so aij + (bij + cij) = (aij + bij) + Gy. 
Therefore 


A+(B+C)=(A+B)+C. 


(b) We add corresponding entries of the two 
matrices. The (i, 7)-entry of the matrix A + 0 is 
aij +0 = aij. Therefore A+0O= A. 

Matrix addition is commutative (property A5), so 
0+ A= A also. 

(c) Let A = (aij), so —A = (—ajj). We add 
corresponding entries: the (i, 7)-entry of the matrix 
A+ (—A) is aij + (—aij) = 0. Thus the matrix 

A + (—A) is the zero matrix 0. 


Matrix addition is commutative (property A5), so 
—A +A = A + (—A). Thus —A + A is also the 


zero matrix 0. 


Solution to Exercise C19 

Let A = (a;;) and B = (bij). Then the (i, j)-entry 
of k(A + B) is k(aij + bij). 

Now, kA = (kaij) and kB = (kbj;), so the 

(i, 7)-entry of kA + kB is kaj; + kbij = k(aij + bij) 
since aij, bij and k are real numbers. 

The (i, j)-entries of k(A + B) and kA + kB are 
equal. Thus 


k(A +B) = kA + kB. 


Solution to Exercise C20 


(a) The product of a 3 x 2 matrix with a 2 x 1 
matrix is a 3 x 1 matrix: 


2 -1 4 
0 3 (3) =1|6 
1 2 7 


95 


Unit C1 Linear equations and matrices 


(b) The product of a 1 x 2 matrix with a 2 x 2 
matrix is a 1 x 2 matrix: 


(2 1) c 5) =@ 14). 


(c) This product is not defined, since the first 
matrix has 1 column and the second has 2 rows. 


(d) The product of a 2 x 1 matrix with a 1 x 3 
matrix is a 2 x 3 matrix: 

—4 

ae 


(a) e 0 -9=(5 o 


(e) The product of a 2 x 3 matrix with a 3 x 3 
matrix is a 2 x 3 matrix: 


312 —2 0 1 
051 1 3 0 
4 1 1 


Solution to Exercise C21 
(a) We first prove the if statement. 


Suppose A and B are square matrices of the same 
size, then the product AB can be formed because 
A has the same number of columns as B has rows. 
Likewise, the product BA can be formed. Both 
the products AB and BA will be square matrices 
the same size as A and B. 


We now prove the only if statement. 


Suppose the products AB and BA are the same 
size, and suppose A is an m x n matrix and B is a 
px r matrix. 


Since the product AB is defined, then we must 
have n = p, and therefore the product AB is an 
m x r matrix. 


Since the product BA is defined, then we must 
have r = m, and therefore the product BA is a 

p x n matrix. 

Since the products AB and BA are the same size, 
m = p and r = n, but this combined with n = p 
and r = m, implies that p = r = m = n. Therefore, 
both A and B are square matrices of the same size. 


(b) Let A = ({ o) and B = ¢ Then 


in this case. It follows that matrix multiplication is 
not commutative even for square matrices of the 
same size. 


(There are infinitely many possible examples here; 
however, the trick when looking for a 
counterexample is to do as little work as possible: 
setting several of the entries to zero makes the 
multiplication easier!) 


Solution to Exercise C22 


The product of a 2 x 2 matrix with a 2 x 2 matrix 
is a 2 x 2 matrix. 


wa YG -Cia 
mai DG J- 


Note that AB and BA are equal in this case. 


(c) Matrix multiplication is associative, so 
ABC = (AB)C 


“(66 2C 9) a) 
7 e a l 
~ e an) 


(If you worked out = A(BC) then you would have 
got the same final answer here.) 


Solution to Exercise C23 


1 
(a) The matrix | 0 
0 


O Ne 


1 
2 | is upper triangular. 
3 


(b) The matrix is diagonal (so it is also 


9 0 
0 0 
both upper and lower triangular). 


0 0 1 
(c) The matrix |0 1 2| is not diagonal, upper 
k 23 


triangular or lower triangular. 


(d) The matrix a i is lower triangular. 


Solution to Exercise C24 


The (i, j)-entry of the product ImA is obtained by 
multiplying each entry in the ith row of Im by the 
corresponding entry in the jth column of A and 
adding the results. Now, the ith row of Im has a 1 
in the ith position and zeros elsewhere. Therefore 
the only non-zero term in this sum is the ith entry 
of the jth column of A, that is, the (i, 7)-entry 

of A. Thus IA = A. 


The (i, j)-entry of the product AI, is obtained by 
multiplying each entry in the ith row of A by the 
corresponding entry in the jth column of I, and 
adding the results. Now, the jth column of I, has 
a 1 in the jth position and zeros elsewhere. 
Therefore the only non-zero term in this sum is the 
jth entry of the ith row of A, that is, the 

(i, 7)-entry of A. Thus AI, = A. 


Solution to Exercise C25 


(a) The transpose of a 3 x 2 matrix is a 2 x 3 
matrix: 


T 
: ‘ er 
“e.g 4 2 10 


(b) The transpose of a 3 x 3 matrix isa 3 x 3 
matrix: 


21 A? A 04 
o0 3 -5| =1]i 37 
47 0 2 -5 0 


(c) The transpose of a 1 x 3 matrix is a 3 x 1 
matrix: 
10 
(10 4 6) =| 4 
6 


(d) The transpose of a 2 x 2 matrix is a 2 x 2 
matrix: 


T 


Solutions to exercises 


Solution to Exercise C26 
(a) Here, 


r sae. 
Pa 


r_(7 91 
i -( 10 a) 


and 
8 10 
A+B= 1/12 14 
16 18 
So 
8 12 16 
Ta 
eer) =(15 14 a 
and 


i go 7 91 
T T 
aT+BT= (0 alt (s 10 i] 
_ (8 12 16 
~\10 14 18 


= (A+B). 


So 


acy" = (5 i ne 


The product ATCT cannot be formed, since AT 


a 2 x 3 matrix and C7 is a 2 x 2 matrix. 
The product C7 A” does, however, exist: 


penr fk Vet. 3.5 
a. 2 4 6 
(2-7 ti 
~\o 4 6 


= (AC). 


97 


Unit C1 Linear equations and matrices 


98 


Solution to Exercise C27 


Suppose, for a contradiction, that there exists a 


matrix B = 


1 -l 
—1 1 


Multiplying the matrices on the left-hand side 


gives: 


a-c 
—a +c 


Looking at the entries in the first column, we must 
have a — c = 1 and —a + c = Q, that is, a— c = 1 
and a — c = 0. This contradiction shows that there 
exists no such matrix B. (The same conclusion 
arises from looking at the entries in the second 


column.) 


Solution to Exercise C28 
The equation II = I shows that I is invertible, with 


inverse I. 


Solution to Exercise C29 


To prove that AB is invertible, with inverse 
B-!A~1, we have to show that 


(AB)(B-'A~') = I = (BATH (AB). 


By the associative property (M2), 
(AB)(B-1A~') = A(BB"')A7! 


and, similarly, 


(B-'A~')(AB) = B7!(A7!A)B 


Therefore AB is invertible, with inverse BATH. 


l such that AB = I, that is, 


Solution to Exercise C30 
(a) We row-reduce (A |T). 


rı 


G 


1 
r2 > ro — 4rı 0 


1 
ro —> -—šr2 0 


rı > rı — 2r2 


0 


O NIN 


CO NIN 


Nio NIN 


= 
sJ 


ial 
~io e 


The left half has been reduced to I, so A is 
invertible; its inverse is 


| 
= 
Il 
oS 
l 
NSIN Tj- 
l 
NFR Ni 
Ni 


(b) We row-reduce (B | I). 


rı 
r2 
r3 


rə > rə2— 2r] 
r3 > r3 +3rı 


ro > =T3 


rı > Tri = T2 


r3 > r3—2rə 


ri Tr 2r3 
> —7 foe 2r3 


—4 


Ee OCF 


l 
N 


e Ub m o U m W 


l 
w 


0 
=] 


NOwo å yere. OeO Oro OF OQO 


3 


5 
6 
3 


eN DYV Foo FOO FOO KF OO 


| 
| 
| 
| 
| 


The left half has been reduced to I, so B is 


invertible; its 


BS 


1 1 
2 1 
—3 -l 
if 1 
0 -l 
0 2 
1 1 
0 1 
0 2 
1 0 
0 1 
0 0 
1 0 
0 1 
0 0 
inverse is 
—3 5 2 
0 3 2 
—1 2 1 


(c) We row-reduce (C | I). 


rı 2 4 6| 1 0 O*/\13 
ro 1 2 4|0 1 0ļ8 
r3 5 10 S| 0 © i) 21 
rı > $r 1 3|) 4 0 0\# 
1 4) 0 1 0f8 
5 10 5| 0 0 1/21 
1 2 3/ 4 0 QÀ # 
Yo —> Y2 — ri 0 0 1 -4 1 0 3 
r3 > r3— 5r 0 0 -10;}-3 0 = 


The usual strategy for row-reduction has created a 
leading 1 in the second row that does not lie on the 
main diagonal of the left half. Hence the left half 
cannot reduce to I, and therefore C is not 
invertible. 


Solution to Exercise C31 


The matrix form of the system is 


1 1 N fz 1 
-1 0 -4]ly)=[ 2 
3 2 i0/ \z =i 


Multiplying this equation on the left by the inverse 
of the coefficient matrix gives the solution 


x en 1 0 

y)={-1 2 1 =| 23 
1 1 1 

Z “Loy 3/ \l -3 


that is, 7 = 0, y = 2, z = — 


Solution to Exercise C32 


@ (oG 2 =G 2 4) 


elementary A matrix 
matrix obtained when 
associated with rı > 5rı 
rı > 5rı is performed on A 
1000 1 2 1 2 
01 0 3 3.4 24 28 
(b) 001 0 5 6 5 6 
0001 7 8 7 8 
elementary B matrix 
matrix obtained when 


associated with 
r2 > r2 + 3r4 


ro > r2 + 3r4 
is performed on B 


Solutions to exercises 


Solution to Exercise C33 


(a) The inverse elementary row operation of 
rı > rı — 2ro is rı > rı + 2rə. 


The working below shows the sequence of two row 
operations performed on A. 

3 

6 


rı il 2 
r2 4 5 
ri >> ri = 2r2 —7 —8 —9 
4 5 6 
2 3 
5 6 


(b) The inverse elementary row operation of 
rı © rə is rı Oro. 


rı > rı + 2re ( 1 
4 


The working below shows the sequence of two row 
operations performed on A. 


rı 1 2 3 
TQ 4 6 


5 
ri e rə 4 5 6 
1 2 3 
ri <> To 1 2 3 
4 5 6 


(c) The inverse elementary row operation of 
rə > —3r2 is rg > — ire. 


The working below shows the sequence of two row 
operations performed on A. 


rı 1 2 3 
r2 4 5 6 
1 2 3 
—12 —15 -18 
1 2 3 

4 5 6 


Solution to Exercise C34 


ro > —3r2 


ro: —$r2 


Matrix A has associated elementary row operation 
rı — 2r,, which has inverse rı —> iri. The inverse 
of A is the elementary matrix associated with this 
inverse row operation, which is 


1 

5 0 0 

A'‘t=|/01 0 
001 


99 


Unit C1 Linear equations and matrices 


Solution to Exercise C35 


(a) ; J = (6x 2)- (x4) =6 


(b) = E = (10 x 2)— (—4 x (—5)) =0 


7 3 
(c) liz 9 


|= (7x2) - (x10) =-37 


Solution to Exercise C36 


(a) First we evaluate the determinant of the 
matrix: 


4 2 


5 6G = (4x 6) — (2x 5) = 14. 


This determinant is non-zero, so the matrix is 


invertible. We use the formula to find the inverse: 


4 2\~"_1/6 -2 
5 6)  4\-5 4 
7 7 
5. 2}° 
14 7 


(b) First we evaluate the determinant of the 
matrix: 

1 ql 
=] 1 


|=axD-0xD)=2 


This determinant is non-zero, so the matrix is 


invertible. We use the formula to find the inverse: 


NIe wle 


Ce n. 


(c) First we evaluate the determinant of the 
matrix: 

1 -l 
=] 1 


[=x 1)-C1x (9) =0. 


This determinant is 0, so the matrix is not 
invertible. 


100 


Solution to Exercise C37 


(a) We have 
3 2 1 
4 0 =l 
0 —1 1 
0 —1 4 -1 4 0 
-ofi a-e aeh = 
= 3((0 x 1) — (-1 x (=1))) 
— 2((4 x 1) — (-1 x 0)) 
+((4 x (—1)) — (0 x 0)) 
ZET: 
(b) We have 
2 10 0 
3 —1 2 a -of +0 
5 9 2 


= 2((—1 x 2) — (2 x 9)) 
— 10((3 x 2) — (2 x 5)) 
=0 


Solution to Exercise C38 


The cofactor A13 is (—1)!t3 = (—1)4 = 1 times the 
determinant of the submatrix obtained by 
removing the top row and third column of A: 


2 35 


1 

3 2 

Ai |) 5 3 
5 1 4 


w Ne 


The cofactor Ags is (—1)4+° = (—1)? = —1 times 
the determinant of the submatrix obtained by 
removing the fourth row and fifth column of A: 


1234 


2 
Ass =- |3 


5 


= A ù% 


4 5 
5 1) 
2 3 


Solutions to exercises 


Solution to Exercise C39 We have (det A)(det B) = 10 x 7 = 70, and so 
We apply Strategy C5: det(AB) = (det A)(det B). 
0-2 is ; ; 
-3 0 0 -1 Solution to Exercise C41 
l0- 1- 0 (a) We apply Strategy C5: 
0 4 2 0 010 T 
3 0 1 3 0 1 1 o of =0-| i+ 
sg- 11 of+]/10 0 001 y 
02 o |04 0 we 
—3 0 0 
(Ay 4: 6-4 (b) We apply Strategy C5: 
0 4 2 1 0 0 0 100 
0 1 00 
=-2(-3|; -0+6 1) 0 0 k ol |, : Fai 
0 0 0 1 
0 0 1 0 
+(-3[f o -0+ il) _ o [-o+9 
0 1 
0 1 
+ (3l; a -0+0) a 
= (—2)(—2) + (—1)4 + (—3)(—4) _ 
=s'{9: (c) We evaluate the determinant: 
1 0 
. š =(1x1)-(0xk 
Solution to Exercise C40 p a ) 
Here, =1. 
—3 1 i i 
det A=] 3 4 Solution to Exercise C42 
= (—3 x (—4)) — (1 x 2) = 10, First notice that 
—2(1 —2 4)=(-2 4 -8), 
nal } ( r= ) 
j that is, the first and third rows of A are 
= (1x 5)— (1 x (-2)) =7 proportional. Therefore, by Theorem C16, 
and 


det A = 0. 


det(A + B) = (—2 x 1) — (2 x 0) = -2. 


We have det A + det B = 10+ 7 = 17, and so 
det(A + B) is not equal to det A + det B. 


— 2 
ae G a 


det(AB) = (—5 x (—18)) — (2 x 10) = 70. 


SO 


101 


Unit C1 Linear equations and matrices 


Solution to Exercise C43 


We interchange the first and third rows, and apply 
Theorems C14 and C15, giving 


10 3 —4 2 06 00 
02 01 02 01 
dA 6.6 0 ol | 40 3 —4 9)’ 
—1 2 1 0 —1 2 1 0 
We use Strategy C5 to evaluate this determinant: 
0 0 1 
det A = (—1) | 0—6/10 —4 2|+0-0 
—1 1 0 


10 —4 
=6(0-0+/") 1) 


= 6 ((10 x 1) — (—4 x (—1))) 


= 36. 


102 


Unit C2 
Vector spaces 


Introduction 


In this unit you will meet a mathematical structure that is one of the most 
important unifying concepts of pure mathematics. It is that of a vector 
space. A vector space consists of a set of elements called vectors, and two 
operations: addition of vectors and multiplication by a scalar. These 
vectors need not be vectors in the geometric sense given in Book A; 
instead, they may be a wide range of objects including complex numbers, 
functions and matrices. 


You will first consider properties of R? and R, and see how these two- and 
three-dimensional spaces lead not only to n-dimensional space R”, but also 
to the formal definition of a vector space. You will meet a variety of quite 

different vector spaces and study various concepts relating to vector spaces. 
For example, you will meet the idea of a subspace of a vector space, which 
is a subset of a vector space that is itself a vector space; this is similar to 

the relationship between subgroups and groups, which you met in Book B. 


The theory of vector spaces introduced in this unit will underpin the 
remaining units of this book. 


1 Vector spaces 


In Book A you met the plane and three-dimensional space. In this section 
you will see that properties that you are familiar with in these two- and 
three-dimensional spaces also hold for other, quite different-looking spaces. 


1.1 Euclidean spaces 


Recall from Unit Al Sets, functions and vectors that R? is the set of all 
ordered pairs of real numbers, and R? is the set of all ordered triples of real 
numbers. You saw that we can interpret these sets as the plane and as 
three-dimensional space, respectively, in the following two ways. We can 
interpret their elements first as the coordinates of points with respect to a 
specified coordinate system, and second as vectors in component form with 
respect to this coordinate system. 


In this way, once axes have been specified, we can consider the elements 
of R? equivalently as ordered pairs, as points in the plane, or as vectors in 
the plane. And likewise for R°, we can consider the elements equivalently 
as ordered triples of real numbers, as points in three-dimensional space or 
as vectors in three-dimensional space. 


Also in Unit Al, you met two operations: addition of vectors and 
multiplication of a vector by a scalar. These operations are defined on R? 
and R? as follows. 


1 Vector spaces 


105 


Unit C2 Vector spaces 


106 


Definitions 


In R?, the set of ordered pairs of real numbers, the operations of 
addition and of multiplication by a scalar are defined as: 


(u1, U2) + (v1, v2) = (u1 + v1, U2 + V2), 
a(u1, u2) = (au1,au2), where a ER. 


In R, the set of ordered triples of real numbers, the operations of 
addition and of multiplication by a scalar are defined as: 


(u1, u2, ug) + (v1, v2, v3) = (u1 + U1, U2 + V2, U3 + U3), 


a(uyz, U2, u3) = (au), Quz, quz), where a E R. 


It turns out that R? and R? are particular instances of a class of 
mathematical structures called vector spaces. In this unit you will meet 
many other examples, and study the properties that are common to all of 
them. 


You are familiar with vectors in R? and R, but there is no reason to stop 
at R? — why not consider R4, R, or even R”, for larger positive integers n? 


Definitions 


Let n be a positive integer. An ordered n-tuple is a sequence of real 
numbers (u1, U2,..-,Un). The set of all ordered n-tuples is called 
n-dimensional space, and is denoted by R”. 


To highlight the connection between n-dimensional space (for a positive 
integer n), denoted by R”, and 2- and 3-dimensional space with 
geometrical vectors, the space R” is often called a Euclidean space and 
its elements (u1, u2,...,Un) are called vectors. For example, Rt is the 
four-dimensional Euclidean space of vectors with four components. 


Although it is difficult to visualise vectors in spaces with dimension greater 
than three, it is possible to carry out exactly the same algebraic 
manipulations with these vectors, and it turns out that these spaces are 
also vector spaces. 


Vector addition and scalar multiplication in R” are defined as in R? 
and R. 


Definitions 
Let 
U= (is, oe EM Y = (01d Up) 


be two vectors in R”. The operations of addition and of 
multiplication by a scalar are defined as: 


Wy = (up, to, <2 4 Un) F (U paaa Ùp) 
= (uy + V1, U2 + V2, .. . , Un + Un), 
au = (au, au2,...,&Un), where a ER. 


Worked Exercise C20 


Let u = (1,1,...,1) and v = (1,2,...,n) be two vectors in R”. Form the 
vectors u + v and 2u. 


Solution 
Wav S e DEE A a Oe a ee) 
Do = Hi ly easg l) = (2,22022) 


Exercise C44 


Let u = (1, —1, 2,0, —3) and v = (0,2, —1, 4,0) be two vectors in R5. Form 
the vectors u + v and —3u. 


This method of generalisation (here from R? and R® to R”) is common 
throughout mathematics. We start with spaces like R? and R? that we can 
visualise and look at their properties, and then we generalise these 
properties to spaces that we cannot easily visualise, such as R”. So we go 
from particular cases to a general case. 


We can go even further, and think of a vector with a never-ending list of 
components (v1, V2, U3,...). This is hard to visualise, but is not difficult to 
handle mathematically. The set of such vectors is called R, and is an 
infinite-dimensional vector space. (You will meet a formal definition of 
dimension of a vector space in Section 3.) Vector addition and scalar 
multiplication are again performed component-wise. 


1 Vector spaces 


107 


Unit C2 Vector spaces 


108 


Worked Exercise C21 


Let u = (1,0,1,0,1,...) and v = (1, —2, 3, —4,5,...) be two vectors in R™. 
Form the vectors u + v and 5u. 


1.2 Real vector spaces 


Before meeting the definition of a vector space, we will look at Rt and a 
set of polynomials, and will observe that, despite their apparent 
differences, these sets share many important properties. 


The space R* 


A vector in Rt has the form (v1, v2, v3, v4), where v1, 2,3 and v4 are real 
numbers, and the operations of vector addition and scalar multiplication 
are as defined in the previous subsection. 
If we have two vectors u = (uj, u2, U3, u4) and v = (v1, v2, v3, v4) in R4, 
then their sum is 

u + v = (u1, U2, U3, U4) + (01, V2, U3, U4) 

= (u1 + v1, U2 + V2, U3 + U3, U4 + U4). 

This last vector also belongs to R* because each of the four components is 
a real number, so R* is closed under vector addition; that is, the closure 
property (A1), which you met in Unit A2 Number systems, holds for the 
addition of vectors in R4. 


For example, if u = (1,3,5,7) and v = (2,—1, —5, 6) are vectors in Rt, then 
otv=(,55,7) 2041-30020), 
which is a vector in R4. 


In fact addition of vectors in R4 satisfies all the usual rules of arithmetic, 
as follows. The next worked exercise proves the commutative property 
(A5) and the additive identity property (A3), and you are asked to prove 
the remaining two properties in the following exercise. 


Addition of vectors in R4 

A1 Closure For all u,v € Rf, 
Tey eRe 

A2 Associativity For all u,v, w € R’, 
(u+v)+w=u+(v+w). 

A3 Additive identity For all v € Rt, and 0 € R4, 
YOS yS 


A4 Additive inverses For each v € Rt, there is a vector —v € R* 
such that 


v+(-v)=0=-v+v. 
A5 Commutativity For all u,v € R4, 


USAN = Wear Uh 


Worked Exercise C22 


Prove that the following properties hold for vector addition in R£. 


(a) The commutative property (A5): u + v = v + u. 


(b) The additive identity property (A3): v + 0 = v = 0 + v, where 0 is 
the zero vector (0,0,0,0). 


1 Vector spaces 


109 


Unit C2 Vector spaces 


110 


Exercise C45 


Prove that the following properties hold for vector addition in R4. 
(a) The associative property (A2): (u+v)+w=u+(v+w). 


(b) The additive inverses property (A4): v + (—v) = 0 = —v + v, where 
v= (v1, V2, V3, V4) and —v = ( U1, —V2, —U3, v4). 


Recall from Unit B1 Symmetry that a set with a binary operation is a 
group if the following four axioms hold: 


G1 (closure); G2 (associativity); G3 (identity) and G4 (inverses). 


The first four properties (A1-A4) of vector addition in R* show that the 
set Rt under the operation of vector addition satisfies these four 
properties; that is, (Rt, +) is a group with additive identity the zero vector 
(0,0,0,0), and —v the additive inverse of v. The final property, 
commutativity (A5), shows that it is in fact an abelian group. 


These properties all involve vector addition, but R4 also has some 


properties that involve scalar multiplication. 
Let v = (v1, v2, 3,04) E Rt and a € R. Then 
av = a(v1, V2, U3, v4) = (av, av2, AZ, AU4). 
This vector also belongs to R4, so R4 is closed under scalar multiplication. 
For example, if v = (1,2, —5, —3) € R* and a = 4, then 
av = 4(1,2, —5, —3) = (4,8, —20, —12), 
which belongs to R£. 


Note that if you multiply a vector in R4 by 8 € R, and then by a € R, you 
obtain the same result as multiplying by a8. This is because, for all 
a, 8 € R and v = (v1, ve, v3, va) € RÉ, 
a(Bv) = a(B(v1, 2, v3, U4) 
= a(8v1, Bv2, Bus, Bva) 
= (a8v1, aBv2, aßv3, aBv4) 
= (a8) (v1, v2, v3, v4) 
= (a)v. 
For example, if v = (1,2, —5, —3) € R* and a = 4, 8 = —2, then 
a(6v) = 4(—2(1, 2, —5, —3)) 
= 4(—2, —4, 10,6) 
= (—8, —16, 40, 24) 
= (—8)(1, 2, —5, —3) 
= (a)v. 


Also, if v = (v1, v2, v3, v4), then 


lv = 1(v1, V2, V3, V4) = (v1, V2, U3, V4) = V; 


These properties of scalar multiplication of vectors in R4 can be 
summarised as follows. 


Scalar multiplication of vectors in R4 
S1 Closure For all v € Rt, anda € R, 


av € R*. 


S2 Associativity For all v € Rf, and a, € R, 


a(Bv) = (ap)v. 
S3 Scalar multiplicative identity For all v € R4, 


hy = wy, 


Finally, there are two distributive properties that connect vector addition 
and scalar multiplication. 
For example, if u = (1,3,5,7) and v = (2, —1, —5,6) are vectors in R*, and 
a = 3 and 6 = 4, then 
a(u+v) = 3((1,3,5, 7) + (2, —1,—5, 6)) 
= 3(3, 2,0; 13) 
= (9,6, 0,39) 
and 
au + av = 3(1,3,5, 7) + 3(2, —1, —5, 6) 
= (3,9, 15, 21) + (6, —3, —15, 18) 
= (9, 6,0, 39), 
which illustrates the first distributive property. Also, 
(a+ B)v = (3 + 4)(2, —1, —5, 6) 
= 7(2,—1, —5, 6) 
= (14, —7, —35, 42) 
and 
av + Bv = 3(2,—1, —5, 6) + 4(2, —1, —5, 6) 
= (6, —3, —15, 18) + (8, —4, —20, 24) 
= (14, —7, —35, 42), 


which illustrates the second. 


1 Vector spaces 


111 


Unit C2 Vector spaces 


These properties connecting vector addition and scalar multiplication can 
be summarised as follows. 


Combining addition and scalar multiplication of vectors in R4 
D1 Distributivity For all u,v € Rt, anda € R, 


a(u+v) = au + av. 


D2 Distributivity For all v € Rt, and a, 8 € R, 
(a+ B)v = av + 8v. 


The space of quadratic polynomials 
Let us now look at another, apparently very different set of elements. This 
is the set of quadratic polynomials, namely, functions of the form 
p:R — R 
r a + br + cr’, 


where a,b,c € R. We call this set P because it comprises all the real 
polynomials of degree less than 3. Thus 


P; = {p(x) : p(x) = a + bz + cx”, a,b,c € R}. 


Here we have used the convention from Book A that when a real function 
is specified only by a rule, it is understood that the domain of the function 
is the set of all real numbers for which the rule is applicable, and the 
codomain of the function is R. 


(We write the terms of the polynomial in increasing order of powers here, 
as usually done when working within a vector space of polynomials.) 


To simplify the notation further, we write 
P; = {a+ bz + cx’ : a,b,c € R}. 


This set includes the quadratic polynomials (where c is non-zero), the 
linear polynomials (where c is 0 and b is non-zero) and constants (where b 
and c are 0 and a is non-zero), as well as the zero polynomial (where 
a=b=c=0). At first sight, there is no reason why this set of elements 
should have the properties that we have just shown are satisfied by R4; 
however, these properties all hold for this set as well. 


First we consider the properties Al—A5 involving addition. 
Consider p1(x) = a, + bya + cya? and po(x) = ag + box + cox”, then 
pi(z) + po(x) = (a1 + bya + cx?) + (ag + bow + cox”) 
= (a; + ag) + (bı + b2)x + (cr + €2)2”, 


which also belongs to P3. Therefore the closure property (A1) holds for 
addition in P3. 


112 


For example, 3 + 4x — 22? and 5 — 3x + 7x” both belong to P3, and 
(3 + 4a — 22°) + (5 — 32 +72”) = 8 + x + 52”, 


which also belongs to P3. The next worked exercise proves the 
commutative property (A5) and the additive inverses property (A4), and 
you are asked to prove the remaining two properties in the following 
exercise. 


Worked Exercise C23 


Prove that the following properties hold for addition in P3. 


(a) The commutative property (A5): pi(x) + po(x) = po(x) + pı (£). 
(b) The additive inverses property (A4): 
pi(x) + (—pi(x)) = 0 = -pı (z) + pı (z). 


Exercise C46 


Prove that the following properties hold for addition in P3. 

(a) The associative property (A2): 
(pi(x) + pa(x)) + p3(x) = pi(x) + (p2(x) + ps(z)). 

(b) The additive identity property (A3): pı(x) +0 = pi(x) = 0 + pı (x), 
where 0 = 0 + 0x + 0x? is the zero polynomial in P3. 


1 Vector spaces 


113 


Unit C2 Vector spaces 


114 


It follows that P3 satisfies the same addition properties as Rt, and 
therefore P> is also an abelian group under addition. 


We can multiply a polynomial through by a real constant; that is, by a 
scalar. In fact P has the same properties involving scalar multiplication as 
Rt. 


Let p(x) = a + br + cx? and a € R, then 

ap(z) = a(a + br + cx?) = (aa) + (ab)x + (ac)z?, 
which also belongs to P3. So Ps is closed under scalar multiplication; that 
is, the closure property (S1) holds for P under scalar multiplication. 


In the following exercise you are asked to check the remaining properties 
involving scalar multiplication (S2 and S3), for a particular case. 


Exercise C47 


Let p(x) = 1 — z + 22? and a = 2, 6 = —3. Show that the following 
properties hold for these scalars and this quadratic polynomial. 


(a) The identity property (S3): 1 x p(x) = p(x). 
(b) The associative property (S2): a(bp(x)) = (aB)p(z). 


To finish looking at the properties of P3, we note that the distributive 
properties (D1 and D2) that connect addition and scalar multiplication 
hold for P3; the proofs simply involve multiplying out brackets. For all 
pı(z), po(x) € Ps and a,8 ER, 


a(pı(x) + po(x)) = apı (x) + ape(x) 


and 


(a+ B)pi(x) = apı (x) + Bpi(z). 


So Rf and P; satisfy the same set of properties with respect to addition 
and scalar multiplication, even though R* is a Euclidean space and P; is a 
set of polynomials. The idea that connects them is the concept of a vector 
space. 


Vector space definition 


In Book B we studied symmetries of geometric figures, and then abstracted 
the properties to obtain the definition of a group. We go through a similar 
process here. We have just studied R* and P3, and we now abstract from 
them the definition of a vector space. We then go on to look at other 
examples of vector spaces. The elements of these vector spaces are of 
diverse types: complex numbers, functions, matrices, and many others. 


The definition of a vector space is one of the longest definitions in 
mathematics. It looks formidable, but the axioms Al—A5, S1-S3 and 
D1-D2 are precisely the properties we checked for Rt and Pz. Thus this 


definition follows naturally from our previous examples. As for R4 and P3, 
axioms Al—A5 refer to vector addition (implying that a vector space is an 
abelian group under addition), S1-S3 refer to scalar multiplication, and 
D1—D2 to how we combine these operations. Therefore a vector space is a 
set of objects called vectors that can be added together and scalar 
multiplied in such a way that all the usual properties of arithmetic hold. 
Thus the definition includes the properties for addition, the properties for 
scalar multiplication and the properties of how these two operations 
combine. 


Definition 
A real vector space consists of a set V of elements called vectors 


and two operations, vector addition and scalar multiplication, such 
that the following axioms hold. 


Axioms for addition 
A1 Closure For all vj,v2 E V, 
vi + v2 E V. 
A2 Associativity For all v1, v2, v3 € V, 
(vi + v2) + v3 = vı + (v2 + v3). 


A3 Additive identity For all v € V, there is a zero element 
0 € V satisfying 


v+0=v=0+v. 


A4 Additive inverses For each v € V, there is an element —v 
(its additive inverse) such that 


v+(-v) =0=-v+v. 
A5 Commutativity For all v1, v2 E€ V, 
Wit ar YW = Waele 
Axioms A1l—A5 imply that (V, +) is an abelian group. 
Axioms for scalar multiplication 


S1 Closure For allv€V,andaeR, 
ave V. 

S2 Associativity For all v € V, and a, 8 € R, 
a(Bv) = (aB)v. 

S3 Scalar multiplicative identity For all v € V, 


hy = wy, 


1 Vector spaces 


115 


Unit C2 Vector spaces 


116 


Axioms combining addition and scalar multiplication 


D1 Distributivity For all vı, v2 € V, and Q E€ R, 
a(vı + v2) = avı + ave. 

D2 Distributivity For all v € V, and a, 8 E€ R, 
(a+ B)v = av + 8v. 


The word ‘real’ in this definition refers to the fact that the scalars used in 
forming scalar multiples are real numbers; that is, a real vector space is a 
vector space over the field R (which means that the scalars are elements in 
R). More generally, it is possible to define a vector space over any field, so 
it is also possible to form complex and rational vector spaces, where the 
vectors are scalar multiplied by complex and rational numbers, 
respectively. This is because the sets of complex and rational numbers are 
also fields. However, we are only concerned with real vector spaces in this 
module. 

It is worth noting that R itself is a real vector space: the fact that the 
vector space axioms hold for V = R follows from the field properties that 
hold for R, which were shown in Unit A2 when considering the arithmetic 
of real numbers. 

Where we use the term vector for the elements of vector spaces, many 
mathematical texts use the terms element and vector interchangeably. 


Checking the axioms 


We now look at the set V = {a cos x + bsin z : a,b € R} of functions, and 
show that it is a real vector space by checking all the axioms in the 
definition. You will not be asked to check all these axioms in a single 
exercise: this example simply illustrates how it can be done. 


Addition and scalar multiplication are defined on V as follows. 


If a; cos x + bı sin z and as cos x + bz sin x are vectors of V, and a € R, then 
(a, cos x + bı sin x) + (az cos x + bə sin x) 
= (ay + a2) cosg + (bı + b2) sin z 
and 


a(aı cos + bı sin z) = aa; cos x + abı sin z. 
For example, 

(3 cosg + 2sin g) + (4cos g — 6sinz) = 7 cos x — 4sin x 
and 

—5(3 cosx + 4sin x) = —15 cos x — 20 sin 7. 


We check the axioms one by one. 


A1 Closure V is closed under addition of functions, since, if 
a, cos x + bı sin x and aj cosg + by) sin z are vectors of V, then 


(a, cos x + bı sin x) + (ag cos x + bz sin x) 
= (a; + a2) cosg + (bı + b2) sing, 
which is a vector of V. 
A2 Associativity Addition is associative, since, if a, cos z + bı sin x, 
a2 cos x + bə sin z and a3 cos x + bg sin x are vectors of V, then 
((a cos x + bı sin x) + (az cos x + bg sin x )) + (ag cos x + bs sin x) 
= ((a; + a2) cosx + (bı + b2) sin x) + (a3 cos x + bg sin x) 
= (a1 + ag + a3) cosa + (bı + b2 + b3) sin x 
and 
(acos x + by sin x) + ((ag cos x + be sin x) + (a3 cos x + bg sin x)) 
= (a, cosx + bı sin x) + ((a2 + a3) cos x + (b2 + b3) sin x) 
= (a; + a2 + a3) cosx + (bı + b2 + bg) sin a. 


A3 Additive identity The zero vector is 0 cosg + Osin x, since this is 
in V and, if acosx + bsina € V, then 


(acosx + bsinx) + (Ocosxz + Osin x) = acos x + bsin z 
and 
(Ocosxz + Osin x) + (acosxz + bsin x) = acosz + bsin z. 


A4 Additive inverses The additive inverse of acosx + bsin x is 
—acosx — bsin z, since this is in V and, if a cosg + bsinz € V, then 


(acosx + bsin x) + (—a cos x — bsin x) = 0 cos x + Osin x 


and 


(—a cosg — bsin x) + (a cosx + bsin x) = 0 cos x + Osin zx. 
A5 Commutativity Addition is commutative, since, if aj cos x + bı sin z 
and az cos x + bə sin x are vectors of V, then 
(a, cos x + bı sina) + (az cos x + bə sin <) 
= (a1 + ag) cos x + (bı + b2) sin z 
and 


(a2 cos x + bg sin x) + (a, cosg + bı sin x) 


= (a2 + a1) cosg + (b2 + bı) sina 


= (a; + a2) cosg + (bı + b2) sin z. 


S1 Closure V is closed under scalar multiplication, since, for 
acosx +bsing € V and a € R, we have 


a(acosz + bsin x) = aa cos x + absin z. 


This is in V, since aa,ab € R. 


1 Vector spaces 


117 


Unit C2 Vector spaces 


118 


S2 Associativity For a,8 € R and acosaz + bsina € V, we have 


a (Bla cosg + bsinx)) = a( Ba cosx + Pbsin x) 


apa cosx + apbsin x 
= (aß) (a cosx + bsin x). 
S3 Scalar multiplicative identity For acosx + bsinaz € V, we have 
1(acosx + bsin x) = acos z + bsin z. 
D1 Distributivity For a € R and aı cos x + bı sin z and 
ag cosx + bə sin x in V, we have 
a((aı cos x + bı sin x) + (az cos x + bə sin x)) 
= a ((aı + a2) cosx + (bı + b2) sin x) 
= a(a, + a2) cosx + a(bı + b2) sin z 
and 
a(aı cos + bı sin z) + a(az cos x + bz sin x) 
= aa, cos £ + ab; sin z + aas cos x + abo sin x 


= a(aı + a2) cosx + a(bı + be) sin z. 


D2 Distributivity For a, p € R and acosg + bsing € V, we have 
(a+ 8)(acosz + bsin zx) 
= (a + B)acosx+(a+t 8)bsin x 
= qa cos x + absin x + ba cosg + Bbsina 
and 
a(acosxz + bsinx) + 8(acosaz + bsin z) 


= qa cos x + absin x + Bacosx + Bbsina. 


Since all the vector space properties are satisfied, V is a vector space. 


We now look briefly at some further examples of vector spaces, to give you 
some idea of the different areas of mathematics in which this concept arises. 


The set of linear polynomials P» 


The set P> of linear polynomials comprises the real polynomials of degree 
less than 2; that is, the polynomials of the form p(x) = a + bz, where 

a,b E€ R. Vector addition and scalar multiplication are defined on Pz as 
follows. 


If p(x) = a+ bx and q(x) = c + dz, and a € R, then 

p(x) + g(a) = (a+ bx) + (c + dz) = (a + c) + (b + d)z 
and 

ap(x) = a(a+ bx) = (aa) + (ab)z. 


The result of each of these operations is a linear polynomial, so P> is closed 
under the operations of addition and scalar multiplication, and therefore 
satisfies the closure axioms (Al and S1). The other axioms can be checked 
in the same way. 


More generally, for each positive integer n, the set P, of real polynomials 
of degree less than n, with the usual operations of addition and scalar 
multiplication, is a vector space. 


The set of complex numbers C 


The set C comprises the numbers of the form a + bi, where i? = —1 and 
a,b E€ R. Vector addition and scalar multiplication are defined on C as 
(a+ bi) + (c+ di) = (a+c)+(b+d)i 
and 
a(a + bi) = (aa) + (ab)i. 
This is a real vector space because we multiply the complex number (the 
vector) by a real number (the scalar). 


The result of each of these operations is a complex number, so C is closed 
under the operations of vector addition and scalar multiplication, and 
therefore satisfies the closure axioms (Al and $1). The other axioms can 
be checked in the same way. 


The set M23 of 2X3 matrices with real entries 


The set M23 comprises the 2 x 3 matrices of the form 


abe 
¢ e T where a,b,c,d,e, f E R. 


Vector addition and scalar multiplication are defined on M2, as follows. 


FA= a1 a2 43 and B = bi ba ba , and a € R, then 
a4 a5 a6 b4 bs be 


A + B= ay ag Q3 + bi bo b3 
a4 A5 a6 b4 bs bg 
_ (e +b, a2+b2 az + 


a4 +b4 a5+b5 aş+ be 
and 


ak =a P ag a = (o Qa? P , 
a4 a5 a6 aa, ads adag 
The result of each of these operations is a 2 x 3 matrix with real entries, so 
this set is closed under the operations of vector addition and scalar 


multiplication, and therefore satisfies the closure axioms (Al and $1). The 
other axioms can be checked in the same way. 


More generally, for positive integers m and n, the set Mmn of m x n 
matrices with real entries is a vector space under the operations of vector 
addition and scalar multiplication. 


1 Vector spaces 


119 


Unit C2 Vector spaces 


120 


The set R& 

If u = (uw, ue,...) and v = (v1, v2,...) belong to R”, and a € R, then 
u + v = (ui, u2, ...) + (V1, V2, ...) = (u1 + U1, U2 + V2,...) 

and 
au = of uy; u2, ...) = (aui, auz, ...). 


The result of each of these operations is a vector of R, so R® is closed 
under the operations of vector addition and scalar multiplication, and 
therefore satisfies the closure axioms (A1 and S1). The other axioms can 
be checked in the same way. 


These examples are only a few of the many real vector spaces. You will 
meet more of them as you work through this unit, and as you encounter 
other mathematical concepts in the remainder of this module. 


We finish this section by looking at some sets that are not vector spaces. 
In each case you should assume the usual definitions of addition and scalar 
multiplication for the elements of these sets to show that these sets are not 
vector spaces. 


Worked Exercise C24 


Show that neither of the following sets is a real vector space. 


(a) V = {all polynomials of degree equal to 5} 
(b) V ={a+bicC:a>0} 


2 Linear combinations and spanning sets 


Exercise C48 


Show that neither of the following sets is a real vector space. 
(a) V={(z,y) E€ R?: y=2r4+1} 


(b) v=4(5 c) sab cez} 


2 Linear combinations and spanning 
sets 


In this section you will see that in a vector space, some sets of vectors are 
special. These special sets are such that every other vector in the space 
can be produced by adding combinations and scalar multiples of vectors 
just in this special set. 


2.1 Linear combinations 


We begin by looking at the different ways in which we can express a single 
vector in R? as a combination of two other vectors. 


For example, the vector (5,3) in R?, illustrated in Figure 1, can be written 
as 


(5,3) = 5(1,0) +3(0, 1). 


(1,0) 


Figure 1 The vector (5,3) as 
a linear combination of (1, 0) 


and (0, 1) 


121 


Unit C2 Vector spaces 


(2,0) 


Figure 2 The vector (5,3) as 
a linear combination of (2,0) 
and (1,1) 


(—1, —4, 4) 


Figure 3 A vector in R? asa 
linear combination of three 
vectors 


(111) 


Figure 4 A vector in R? as a 
linear combination of three 
vectors 


122 


We could also write (5,3) in terms of (2,0) and (1,1), illustrated in 
Figure 2. In this case we have 


(5,3) = 1(2,0) + 3(1,1). 


If you look at the right-hand sides of these equations, you will see that 
they both have the same form. In each case we have written 


(5,3) = avı + v2, 


where vı = (1,0), v2 = (0,1), a = 5 and 6 = 3 in the first case, and 
vi = (2,0), v2 = (1,1), a = 1 and 8 = 3 in the second case. 
We call avı + 8v2 a linear combination of the two vectors vı and vo. 


Because vı and vz are vectors in R?, so are avı and 8v2, since they are 
scalar multiples of vı and v2; and hence so is av, + v2, since it is the 
sum of two vectors in R?. So avı + Gv is also a vector in R?. 


Similarly in R®, the vector (—1, —4, 4), illustrated in Figure 3, can be 
written as 


(—1, —4,4) = —1(1,0,0) — 4(0, 1,0) + 4(0, 0, 1) 


or as illustrated in Figure 4, in terms of the three vectors (1,0, 2), 
(0,—1,3) and (1,1,1) as 


(—1,—4,4) = 2(1,0,2) + 1(0,—1,3) — 301,151), 


These are two examples: they are not the only possibilities. Each of these 
equations has the form 


(=1; —4, 4) = avi + Bve2 + V3; 


where the expression on the right-hand side of the equation is a linear 
combination of three vectors. 


These linear combinations of vectors in R? and R? are particular examples 
of the following definition. 


Definition 
Let v1, V2,...,Vz belong to a vector space V. Then a linear 
combination of the vectors v1, V2,...,Vķ is a vector of the form 


Q1V1 + Coa +--- + AkVE, 


where @1,Q@2,...,@ are real numbers. This vector also belongs to V. 


We begin by looking at how we can form linear combinations of vectors, 
and then investigate whether we can write a particular vector as a linear 
combination of other vectors in the same vector space. 


In the worked exercises and exercises of this section we have tried to keep 
the arithmetic simple by using integer scalar multiples and coordinates. In 
general, any real numbers may occur. 


2 Linear combinations and spanning sets 


Worked Exercise C25 


(a) In R, calculate the linear combination 2v1 + 3v2 when vı = (1, 0,3) 
and v2 = (0,2,—1). 

(b) In R4, calculate the linear combination 2v1 + 3v2 + 4v3 — v4 when 
Vi = (i; 0, 3, 1); V2 = (0, 2, 0, —1), V3 = (0, 1; —2, 0) and 
¥a=(2,10,—2,—1); 


Exercise C49 


(a) In R?, let vı = (0,3) and vo = (2,1). Calculate the linear 
combination 4v; — 2v3. 


(b) In R4, let vı = (1,2,1,3) and v2 = (2,1,0,—1). Calculate the linear 
combination 3v1 + 2vo. 


We now look at linear combinations of vectors in vector spaces other than 
R?, R3 and R*. In the worked exercise and exercise that follow, we assume 
that the operations of vector addition and scalar multiplication for 
polynomials, matrices and functions are the usual ones. 


Worked Exercise C26 


For each of the following vector spaces V and vectors v1, v2 and v3 in V, 
form the linear combination 3v, — 2v2 + v3. 


(a) V=Ps, vi=l+2t+2", v=l-r, vs3=24+2". 


i 0 2 2 -1 0 
(b) V = M23, “i= (4 = J v= ( 3 2 


aaf 9 0 
a AG Day" 


123 


Unit C2 Vector spaces 


124 


Exercise C50 


For each of the following vector spaces V and vectors vı and v2 in V, form 
the linear combination 2v1 — 4v9. 


(a) V= P, vy =2-—2+3827, vz=-—1l+r. 


(b) V is the set of all real functions, vı =sinz, V2 = g£ cosg. 


—1 1 3 1 
(c) V = M29, vi=( 2 v= (i E 


Now that we have formed linear combinations of different numbers of 
vectors in various vector spaces, we consider the harder problem of 
deciding whether we can express a given vector as a linear combination of 
a particular set of vectors. In the next worked exercise, we look at an 
example before giving a general strategy. 


Worked Exercise C27 


Determine whether (3, —1) can be expressed as a linear combination of 
each of the following. 


(a) vı = (2,0) and v2 = (1,1). (b) vı = (2,2) and və = (1, 1). 
(c) vi = (9,—3) and v2 = (—6, 2). 


2 Linear combinations and spanning sets 


Solution 


(a) 


We need to find real numbers a and ( such that 
(3, -1) = a(2,0) + 6(1, 1), 

that is, 
(3,-1) = (2a + 6,8). 


@. We equate the two first coordinates (components) to get 
3 = 2a + £, and then the two second coordinates (components) 
to get -1= 8. @ 


Equating corresponding coordinates, we obtain the system of 
linear equations 


AGL ae (oi 
B=-1. 
Substituting 6 = —1 in the first equation gives a = 2. So 
(3, =) T 2(2, 0) E i(i, 1) 
= 2v1 = Vo: 
We need to find real numbers @ and ( such that 
(3, =1) = a(2, 2) ar Bd, 1); 
that is, 
(3, -1) = (2a + 6,20 + §). 
Equating corresponding coordinates, we obtain the system 
265 2 — 3 
20 + le 


®. The left-hand sides of these equations are the same but the 
right-hand sides are different, so we can immediately conclude 
that they are inconsistent. Alternatively, subtracting the second 
equation from the first yields the equation 0 = 4. ® 


This pair of equations is inconsistent, since no values of a and 8 
satisfy both of them. 


®. We might have expected this since any linear combination of 
(1,1) and (2,2) must have both coordinates the same. .® 


We cannot express (3,—1) as a linear combination of these two 
vectors. 


We need to find real numbers a and ( such that 
(3, -1) = a(9, —3) + B(-6, 2), 

that is, 
(3, —1) = (9a — 68, —3a + 28). 


125 


Unit C2 Vector spaces 


The following strategy describes the method we have just used. 


Strategy C6 


To determine whether a given vector v can be written as a linear 
combination of the vectors v1, V2,..., VE: 


1. write v = œi V1 + Q2V2 +--> + QkVk 

2. use this expression to write down a system of linear equations in 
the unknowns Q1, @2,...,@k 

3. solve the resulting system of equations, if possible. 


Then v can be written as a linear combination of v1, V2,...,Vķẹ if and 
only if the system has a solution. 


Recall from Unit C1 Linear equations and matrices that a system of linear 
equations may have no solution, a unique solution, or infinitely many 
solutions. Therefore this strategy may give no solution, a unique solution, 
or infinitely many solutions, as we saw in Worked Exercise C27. 


126 


2 Linear combinations and spanning sets 


When dealing with polynomial functions, such as those in P3, we use the 
fact that two polynomial equations in the variable x are equal if and only 
if the coefficients of corresponding powers of x are equal, and equate 
corresponding coefficients. 


Worked Exercise C28 


(a) 
(b) 


In R3, express the vector (1, 1,1) as a linear combination of the 
vectors (1,0,1), (0,1,2) and (—1,1,0). 


In P3, express the polynomial 2 + 2x + 52? as a linear combination of 
the polynomials 1 + 3z? and 2a — 2”. 


Solution 
We follow the steps of Strategy C6. 


(a) Let a, 8 and y be real numbers such that 


(1, r 1) = a(1, 0, 1) 3p B(0, 12) ap =i, 10) 
Then 
Gye 4b E T 2) 


Equating corresponding coordinates, we obtain the system 


q =J= Í 
Baya ll 
a+ 26 =l 


Adding the first two equations gives a + 8 = 2, and solving this 
and the last equation gives 6 = —1 and a@ = 3. Substitution then 
gives y = 2, so the required linear combination is 


(Ph = 30,1) O 2) 21 1,0), 


(You may have used Gauss-Jordan elimination to solve the 
system of linear equations, rather than solving them directly. 
Either method is fine.) 


127 


Unit C2 Vector spaces 


128 


Exercise C51 


(a) 


(b) 


(c) 


In R?, express the vector (2,4) as a linear combination of the vectors 
(0,3) and (2,1). 

In R3, express the vector (2,3, —2) as a linear combination of the 
vectors (0,1,0), (1,2,—1) and (1,1, —2). 


: 1 ; PERET 
In M22, express the matrix G :) as a linear combination of the 


. 1 -1 0 —2 
matrices G A and (l a 


2.2 Spanning sets 


We now look at the set of vectors that is produced when we form all 
possible linear combinations of a given set of vectors. 


Picture any two vectors in R*, and suppose that we form all possible linear 
combinations of these two vectors. What vectors do we obtain? Are there 
any vectors in R? that cannot be written as a linear combination of these 
two vectors? (We saw such an example in Worked Exercise C27(b).) What 
happens if we start with one vector in R?? If we form all possible linear 
combinations of it, what vectors can result? What happens if we start with 
one, two or three vectors in R°? 


2 Linear combinations and spanning sets 


Let us start with a set consisting of exactly one vector in R? — namely, the 
set containing the vector (1,0). The set of all linear combinations of (1,0), 
illustrated in Figure 5, is 


{a(1,0): ae R} = {(a,0): a € R}. 


Geometrically, the members of this set are the points on the z-axis in R?. 
So this set of linear combinations is a line (the z-axis) in R?. We say that 
the set {(1,0)} spans the x-axis, and that the z-axis is spanned by {(1,0)}. 


Suppose that we now take the set {(1,0), (0,1)} containing two vectors. 
The set of all linear combinations of (1,0) and (0,1), illustrated in 
Figure 6, is 

{a(1,0) + B(0,1): a, 8 E R} = {(a, 8) : a, B E R}. 
Since a and 6 can take any real values, this set consists of all the points in 
R?. We say that {(1,0),(0,1)} spans R?, and that R? is spanned by 
{(1,0), (0, 1)}- 


We now write down the formal definitions of span and spanning, before 
looking at some more examples. 


Definitions 


Let S = {v1, V2,..., Vk} be a finite set of vectors in a vector space V. 

Then the span (S) of S is the set of all possible linear combinations 
Q1V1 + A2QV2 +++: + AVE, 

where @1,Q2,...,@% are real numbers; that is, 

(S) = {avy + a2V2 +--+ + QkVk : 1, Q2,.-..,Q~% E R}. 


We say that the set of vectors {vi,v2,...,v%} spans (S) or is a 
spanning set for (S), and that (S) is the set spanned by S. 


While S is a finite set of vectors, the span (S) is generally an infinite set of 


vectors (such as a line or plane): this is because the linear combinations 
involve the set of real numbers. In fact, the span (S) is itself a vector 
space, as you will see later, in Subsection 4.1 (Theorem C28). 


To test whether a vector v lies in the span of a given set S, we use 
Strategy C6 to determine whether v can be written as a linear 
combination of the vectors in S. 


(1,0) 


(a, 0) 


Figure 5 The linear 
combinations of (1,0) 


(1,0) 


Figure 6 ‘The linear 
combinations of (1,0) and 
(0, 1) 


129 


Unit C2 Vector spaces 


Worked Exercise C29 


Let S = {(1,1,0), (0,1,1)}. Which of the following vectors belong to (9)? 
(a) (0,0,1) (b) (4,2, —2) 


Solution 


We apply Strategy C6. 


(a) 


We write 
(O00) 1) =a 0) (0; 1 ll) (one 498.8): 
Equating corresponding coordinates, we obtain the system 


a =) 


®. Subtracting the first and third equations from the second 
yields the equation 0 = —1. @ 


This system is inconsistent and therefore has no solution. So 
(0,0, 1) does not belong to (S). 


We write 
(4,2, —2) = a(1, 1,0) + 8(0,1,1) = (a,a + B, b). 


Equating corresponding coordinates, we obtain the system 


The first and third equations give a = 4 and 6 = —2, and these 
values also satisfy the second equation. So (4,2, —2) belongs to 
(S) and it can be written as 


(4,9, =2) = 4(1, 1,0) = 210) 1,1). 


Exercise C52 


Let vı = (1,0,3), v2 = (0, 2,0) and v3 = (0,3, 1) be three vectors in R3. 
Use Strategy C6 to determine whether the vector (1,5, 4) lies in the subset 
of R? spanned by each of the following sets. 


(a) {vi,v2} (b) {v1, v2; v3} 


Strategy C6 can also be used to show that a given set of vectors is a 
spanning set for the whole of a particular vector space, as we show in the 
following worked exercise. 


130 


2 Linear combinations and spanning sets 


Worked Exercise C30 


Show that each of the following is a spanning set for R?. 


(a) {(1, 2), (2, —3)} (b) {(1,0), (1, 1), (1, —2)} 


Solution 


®. We need to show that every vector in R? can be expressed as a 
linear combination of the given vectors, so we show that the general 
vector (x,y) can be. & 


(a) 


Each vector in R? can be written as (x,y). To show that (x,y) is 
in ({(1, 2), (2, -3)}), we write 

(x,y) = a(1, 2) + B(2, -3) = (a + 28, 2a — 36). 
Equating corresponding coordinates, we obtain the system 

a+26=2 

20 — 38 = Uh 
whose solutions are a = 4 (3a +2y), B= 4 (2a — y). So any vector 
in R? can be written in terms of (1,2) and (2,—3) as 

(x,y) = 4 (3a + 2y)(1, 2) + 7(2z — y)(2, —8). 
Thus {(1, 2), (2, —3)} is a spanning set for R?; that is, 


({(, 2), (2, —3)}) = R?. 


Each vector in R? can be written as (x,y). To show that (x,y) is 
in EL 0), (i, 1); Ge =D) we write 

(x,y) = a(1,0) + A 1) + (1, -2) 
Equating corresponding coordinates, we obtain the system 

Os oe — 

po By = Y: 

@. We saw in Unit Cl that a consistent system of m equations in 
n unknowns, with m < n, has an infinite solution set. ® 


This is a system of two linear equations in three unknowns, so if 
there is a solution, there will be infinitely many solutions. 


®@. We need just one solution, so try to simplify things by setting 
y=0. e% 


For example, taking y = 0 gives 6 = y and a = x — y. So 
(x,y) = (z z y), 0) ae y(1, 1) aa (1, =2); 
Thus ({(, 0), (í, 1); (i =) = R?. 


131 


Unit C2 Vector spaces 


132 


The solution to Worked Exercise C30(b) shows that the set {(1,0), (1, 1)} 
is a spanning set for RÊ? so, in some sense, the vector (1, —2) is redundant. 
We return to this idea of redundant vectors in a spanning set in the next 

section. 


Exercise C53 


Show that each of the following is a spanning set for R?. 


(a) {(1, 1), (=1,2)} (b) {(2, —1), (3, 2)} 


Exercise C54 


Show that {(1,0,0), (1, 1, 0), (2,0, 1)} is a spanning set for R3. 


The following worked exercise shows that Strategy C6 can be used for 
vector spaces other than R? and R. 


Worked Exercise C31 


Show that {1 + x?, x?,2 — x} is a spanning set for P3. 


2 Linear combinations and spanning sets 


Exercise C55 


Show that {1+ z, 1 + 2?,1+23,z} is a spanning set for Py. 


We look now at sets S in vector spaces V for which {S} is not the whole 
of V. 


Worked Exercise C32 


For each of the following vector spaces V and sets of vectors S' in V, 
determine (S). In parts (a) and (b), describe (S) geometrically. 


(a) VSR, S= {(1,1)}. 
(b) V=R?, S = {(1,0,1), (2,0,3)}. 


C) V=Ms, s=ffo o o)-(o o o) lo o o) 


Solution 
(a) We have 
(S) = {a(1,1) :a E R} = {(a,a):a E€ R}. 
@. A picture can help. .® 


YA 


(1,1) 


Geometrically, (S) is the line y = zx. 
(b) We have 
(S) = {a(1, 0,1) + 8(2,0,3):a,6 E€ R} 
= {(a+ 28,0,a+ 38): a, 8 E R}. 
@. Every point in this set is of the form (z,0,z). &@ 
Thus 


CS) G Ai(e0,2) see Ry 


®. To determine whether (S) is equal to this set we have to show 
that every vector (x,0,z) can be expressed as a linear 
combination of (1,0,1) and (2,0,3). #@ 


133 


Unit C2 Vector spaces 


134 


To show that every vector (x,0,z), where x, z € R, belongs to 
(S), we write 


(z,0,z) = (a+ 26,0,a+ 38). 
Equating corresponding coordinates, we obtain the system 


a+28=2 
a+ 36 =z. 


The solution is 6 = z — xz and a = 3x — 2z, so 
(7, 0;2) = (3x = 2z)(1, 0, 1) F (z z x)(2,0,3). 


Hence (2,0, z) € (S), so any vector of the form (x, 0, z) can be 
written in terms of (1,0,1) and (2,0,3). It follows that 


(3) = a0 e 2 2e Ry. 
@. A picture can help. & 


Sy 


Geometrically, (SY is the plane y = 0. 


We have 
=1 0 il @ & 
0 7 aE (( 0 s) 


“fG 


a a a 0,87 ER} 


b 
®. Every matrix in this set is of the form G 0 a & 


Thus 


2 Linear combinations and spanning sets 


®. To determine whether (S) is equal to this set we have to show 
that every matrix of this form can be expressed as a linear 
combination of the three given matrices. © 


To show that every 2 x 3 matrix with zero entries in the second 
row belongs to (S), we write 


a o tC a 20. 2 ey 
© 0 0) 0 0 0 i 


Equating corresponding entries, we obtain the system 


2a+ 8 = 
—a oD 
38 + 2y=c 


It has solution 
= 4(3a —b—c), 
p= F(a +2b+ 2c), 
y = —74(3a + 6b- c), 
so 


W 
ee 
II 
= 
A 
STS 
Dne 
Se 


) sa,eeR}. 


Exercise C56 


For each of the following vector spaces V and sets of vectors S in V, 
determine (S). 


(a) VaR’, S= {(1,0,0)}. 


p vaia {G DCG 9} 


135 


Unit C2 Vector spaces 


136 


3 Bases and dimension 


In this section you will see that there is a minimum number of vectors 
needed to span a vector space. 


3.1 Linear independence and dependence 


In Section 2 we found several spanning sets for R? and RÌ. For example, in 
Worked Exercise C30(b), we showed that each of the sets 


{(1,0),(1,1)} and {(1,0), (1,1), (1, -2)} 


spans R?. In order to be able to work efficiently with a vector space, we 
need to express each vector in it as a linear combination of a small number 
of vectors. In particular, it would be convenient if we could find a set 
containing the smallest number of vectors that spans the space — that is, 
we want to find a minimal spanning set. 


The set {(1,0), (1,1), (1, —2)} is clearly not a minimal spanning set for R?, 
since the smaller set {(1,0),(1,1)} also spans R?. The vector (1, —2) is 
redundant because it can be written as a linear combination of the vectors 
(1,0) and (1,1): 
(1, —2) = 3(1,0) — 2(1,1). 
Thus, if a vector (x,y) in R? can be written as a linear combination of the 
vectors (1,0), (1,1) and (1, —2), then it can be written as a linear 
combination of just the vectors (1,0) and (1, 1): 
(T, y) = a(l, 0) + p, 1) T (1, —2) 
= a(1, 0) + Bd, 1) ae yB, 0) g 2(1, 1)] 
= (a + 37) (1, 0) + (8 = 27) (1, 1). 
The following general result holds. 


Theorem C20 


Suppose that the vector v4 can be written as a linear combination of 
the vectors vi, V2,...,V%—1. Then the span of the set {vi,vo,..., Ve} 
is the same as the span of the set {vi, v2,...,V%_1}. 


Proof Let S = ({v1, V2,..., Vk—-1}) and T = ({vi, va,..., Vk })- 
Clearly, S C T. 


Now 

T = {a1 v1 + a2V2 +- -- + QkVk : Q1, 02,...,a% E R}. 
As vy, can be written as a linear combination of v1, vo,...,V,z_—1, it follows 
that 


Vk = B1Vı + Bove +++ + Bk—1Vk-1, for some b1, b2, ..., Bk-1 E R. 


So any vector of T can be expressed in the form 
Q1V1 + A2QV2 + +++ + AkV 
= QYV1 + QV2 +++ + Qk—1Vk—1 
+ ag(G1vi + Bove +--+ + Bk-1Vk-1) 
= (a1 + aK 81) v1 + (a2 + aK B2)v2 + +++ + (ak-1 + Akpk-1)Vk-1, 
which belongs to S. Thus T C S. 


Combining these two results gives § = T, as required. B 


So, in order to tell whether a spanning set is minimal, we need to be able 
to test whether every vector in the set can be written as a linear 
combination of the remaining vectors in the set. To make this task easier, 
we introduce the ideas of linear dependence and linear independence. 


Definitions 


A finite set of vectors {v1, V2,...,Vx} in a vector space V is linearly 
dependent if there exist real numbers @1,Q2,...,a%, not all zero, 
such that 


Qivic a2V2 + 2 + QkVk = 0. 


A finite set of vectors {v1, V2, ..., Vk} is linearly independent if it 
is not linearly dependent; that is, if 


Q1Vv1 + @2V2 +: + akVk = 0 


only when a] = ag =---=az,=0. 


Note that ay = ag =--: = æg = 0 is a solution to the equation whether 
the set of vectors is linearly dependent or linearly independent. So the 
distinction between the two cases is whether there is a non-zero solution. 


We use the term linearly dependent because if a set of vectors is linearly 
dependent, then one of the vectors can be written as a linear combination 
of the others — that is, this vector depends on the others. If 


Q1V1 + Q2V2 +++ + QkVk = 0, 
and a, (for example) is non-zero, then we can rearrange the equation to 
give 
eal Qk—1 


Ve = V1 EER Vk-1; 
Ok Ok 


so that vz is a linear combination of the remaining vectors. Hence 
{vi,V2,..., Vk} is a linearly dependent set. 


For example, if 2v1 + 3v2 — 4v3 = 0, then v3 = vı + 3v9, In this case, 
{vi, V2, v3} is a linearly dependent set. We can also write vı in terms of 
v2 and v3, and similarly v2 in terms of vı and v3. 


3 Bases and dimension 


137 


Unit C2 Vector spaces 


138 


Conversely, if one of a set of vectors can be written as a linear combination 
of the others, then the set is linearly dependent; that is, if vz is a linear 
combination of the vectors v1, V2,...,Vk—1, then {v1, v2,..., Vk} isa 
linearly dependent set. 


Statements 1 to 4 below follow from the definitions. 


1. If {v1, V2,..., Vk} is a linearly independent set, then there is only one 
way in which the zero vector can be expressed as a linear combination 
of v1, V2,..., Vx; that is, the trivial way 


0 = Ov, + Ove +--+ + Ove. 
2. If vı is the zero vector, then for a € R, 
av; + 0vg+---+0v;, = 0, 


so any set of vectors containing the zero vector is linearly dependent. It 
follows that a linearly independent set cannot contain the zero vector. 


3. Any set consisting of just one non-zero vector v is linearly independent 
because if av = 0, then either a = 0 or v = 0. Since v is non-zero, we 
must have a = 0, so the set {v} is linearly independent. 


4. Any set of two non-zero vectors is linearly dependent if one of the 
vectors is a multiple of the other, and linearly independent otherwise. 
This applies to vectors in all vector spaces: it is not restricted to vectors 
in R? and R8. 

As an example of statement 4, consider the set {(1, 1,2), (2,2,4)} in R°. 

We have 


(2, 2,4) = 2(1, 1, 2), 
SO 
—2(1,1,2) + (2,2,4) = (0,0,0), 


which is the zero vector in R3. In this case a; = —2 and as = 1. So this 
set is linearly dependent. 


Similarly, {3 — 2x + x?, 6 — 4x + 2x7} is a linearly dependent set in P3 
because 


6 — 4r + 2x? = 2(3 — Qe + x°), 
so 
2(3 — 2a + a”) — (6 — 4a + Qn”) = 0 + Ow + 027, 
which is the zero vector in P3. In this case ay = 2 and ag = —1. 


However, neither {(1, 1,2), (1,2, —3)} nor {3 — 2x + x?,—1 + z + 2x?} isa 
linearly dependent set, as in each case neither vector is a multiple of the 
other. 


Statement 4 therefore gives us a particularly simple way of checking 
whether a set of two non-zero vectors is linearly dependent or linearly 
independent: namely, a set of two non-zero vectors is linearly independent 
if and only if neither vector is a multiple of the other. For vectors in R? 
and R?, this is equivalent to saying that two non-zero vectors are linearly 
independent if and only if they do not lie along the same straight line — 
that is, they are not collinear, as illustrated in Figure 7. 


y YA 


Ry 
Xy 


(a) (b) 


Figure 7 Two vectors in R? that are (a) linearly independent (b) linearly 
dependent 


In this geometric interpretation of R? a vector (x,y) is the position vector 
(x,y), not the point with coordinates (x,y). Therefore ‘being collinear’ is a 
property of the vectors (position vectors), not the points with these 
coordinates. For example, the two points (1,0) and (1,1) are collinear 
since they lie on the line x = 1, whereas the vectors (1,0) and (1,1) are 
not collinear since they are not multiples of one another and they do not 
both lie on a line through the origin: they are linearly independent vectors. 
By their definition as position vectors, collinear vectors will always lie on a 
line through the origin. 


Similarly, three non-zero vectors in R? are linearly independent if and only 
if they do not lie in the same plane — that is, they are not coplanar, as 
illustrated in Figure 8. In this geometric interpretation of R? ‘being 
coplanar’ is again a property of the vectors (position vectors) not the 
points, so coplanar vectors in R? will always lie on a plane through the 
origin. 


Qy 


T W T 
(a) (b) 


Figure 8 Three vectors in R? that are (a) linearly independent (b) linearly 
dependent 


3 Bases and dimension 


139 


Unit C2 Vector spaces 


More generally, we can use the following strategy to test whether a set of 
vectors is linearly independent. 


Strategy C7 


To test whether a given set of vectors {v1,v2,..., Vx} is linearly 
independent: 


1. write down the equation a,v i + Q@gv2o + -+ QkVk = 0 


2. express this equation as a system of linear equations in the 
unknowns Q1,Q2,...,Q 


3. solve the resulting system of equations. 


If the only solution is ay = ag =--- = ag = 0, then the set of vectors 
is linearly independent. 


If there is a solution with at least one of a1,Q2,...,a,% not equal to 
zero, then the set of vectors is linearly dependent. 


Worked Exercise C33 


Use Strategy C7 to determine whether each of the following sets of vectors 
in R? is linearly independent. 


(a) {(2,0,0), (0,0, 1), (—1,2,1)} (b) {(1,1,1), (0,2,1), (1,5,3)} 


Solution 
We follow the steps of Strategy C7. 
(a) We write a(2,0,0) + 6(0,0,1) + y(—1, 2,1) = (0,0, 0). 


@®. This simplifies to (2a — y, 27,8 + y) = (0,0,0). Equating 
corresponding coordinates gives the equations we need. © 


This gives the system of linear equations 


2a = y=0 
Zy = 
B+ y=0. 


The second equation gives y = 0. Substituting this value into the 
other two equations gives a = 0 and 6 = 0. The only solution is 


Therefore this set of vectors is linearly independent. 
(b) We write a(1,1,1) + 6(0,2,1) + y(1, 5,3) = (0,0, 0). 
This gives the system of linear equations 
a + g= 
a +286 +5y=0 
a B+3y7=0. 


140 


®. A solution is not so easy to see, so we use the method of 
Gauss-Jordan elimination from Unit C1. © 


We perform row-reduction on the augmented matrix for this 
system of linear equations. 


rı TOTO? 
r2 LZ SULS 
r3 il ot SIOA & 
LU LIQ 2 
Thay 2 Thay — Th) 0 2 ATOT & 
ea AS a | @ il 20 3 
LO T0 2 
ro > $rv © 1 20] 2 
omi 2lO/ 3 
IO I0 2 
0 1 2/0] 3 
Ligh ze k Mo 0 0 00 0 


The corresponding system of equations is 
Qa a y= 
B+27=0. 
The solution set of the system is 
a= —k, P=—-2k, y=k, kER, 
so there are infinitely many solutions. For example, k = —1 gives 
(it, P2052) 1) (15.3) = (050.0). 
So this set of vectors is linearly dependent. 


®@. Any one of the vectors can be written as a linear combination 
of the other two, for example (1, 1,1) = (1,5,3) — 2(0,2, 1). œ 


We claimed earlier that three non-zero linearly dependent vectors in R? are 
coplanar and this was the case in Worked Exercise C33(b). You may like 
to check that all the vectors in the set lie in the plane through the origin 
with equation x + y — 2z = 0. 


In the following exercise you are asked to determine whether given sets of 
vectors are linearly independent or not. Before embarking on the algebra, 
have a look at each set of vectors and try to decide whether you expect the 
set to be linearly dependent or linearly independent; it may be that 
Strategy C7 is not needed in some cases. 


3 Bases and dimension 


141 


Unit C2 Vector spaces 


Exercise C57 


Determine whether each of the following sets of vectors is a linearly 
independent subset of V. 


(a) V=R’, {(1,0),(—1,—}. 

(b) V=R’, {(1,-1), (1,1), (2, 1)}. 

(c) V=R%, {(1,1,0), (—1,1,1)}. 

(a) V =R’, {(1,0,0), (1,1,0), (1,1, 1)}. 
(e) V=R*, {(1,2,1,0), (0, -1,1,3)}. 


We conclude this subsection by looking briefly at linearly dependent and 
linearly independent sets of vectors in vector spaces other than R?, R3 
and Rt. Again, before embarking on the algebra, it is sensible to have a 
look at each set of vectors: it may be that Strategy C7 is not needed in 
some cases. 


Worked Exercise C34 


Determine whether the set of polynomials {1, 4a, 4x + x?} is a linearly 
independent subset of P3. 


Worked Exercise C35 


In each case, determine whether the set S of matrices is a linearly 
independent subset of Mp». 


@ s-{(@ 3).(3 D) 


142 


={(0 C -)} 


Sh (ere me ncn), 


Solution 


(a) 


®. There are just two matrices and neither is a multiple of the 
other, so the strategy is unnecessary. © 


The set S is linearly independent because neither matrix is a 
multiple of the other. 


®. The second matrix is a multiple of the first (—2 times), so the 
strategy is unnecessary. ® 


The set S is linearly dependent because 
1 -1 —2 2 
aN 3) +( 0 E aE 
®. There is no obvious linear dependence. © 
We apply Strategy C7. 
We write 
a 5) +8 0 eh A ae 
M 2 —2 1 23 O 0 
which can be written as 
Gee eee 7 
—28 +27 2a+6+4+3y OO 


Equating corresponding entries, we obtain the system 


a a = 0) 
a] +3y=0 
= 20 F 2 = 0 


20+ B+3y=0. 


®. The first and third equations both simply relate two 
unknowns, so it is sensible to start with these. @ 


From the third equation we have 26 = 2y, that is, 6 = y, and 
from the first equation a = —27. If we choose y = 1, then 6 = 1 
and a = —2, and these also satisfy the second and fourth 
equations; thus 


(0 Ca WG 3)=G 0): 


So we can find a, 6 and y not all zero such that the original 
equation is satisfied. So the set of matrices is linearly dependent. 
It is not a linearly independent subset of M29. 


3 Bases and dimension 


143 


Unit C2 Vector spaces 


144 


Exercise C58 


In each of the following cases, determine whether S is a linearly 
independent subset of the vector space V. 


(a) V=Py,, 8 = {1,g,x?,£3,1 +g +r? +r’, 


o vem se {0 D D) 
o veaa sefG D 9:6 D) 


ij v= p47 =i: 


3.2 Bases 


We now use the idea of linear independence to help us find a minimal set 
of vectors that spans a vector space. 


If we have a set of vectors that forms a spanning set for a vector space, 
then the set is a minimal spanning set if and only if it is linearly 
independent. 


This condition is certainly necessary because, as we showed in the previous 
subsection, if the set of vectors is linearly dependent, then we can write at 
least one of the vectors as a linear combination of the other vectors. Such 
a vector is redundant, and we can drop it from the set, so the set is not a 
minimal set. 


The condition is also sufficient; we prove this using proof by contradiction. 
Let S = {v1, V2,..., Vg} be a linearly independent spanning set for a 
vector space V, and suppose that the smaller set S1 = {v1, V2,..., Vk—1} 
also spans V. This means we can write any vector in V as a linear 
combination of the vectors in S1. In particular we can write 


Vk = Q1V1 + +++ + Ok-1Vk-1, 
for some aj1,...,@—1 not all equal to 0. Therefore 
avi +--+ + Qk—1Vk-1 — Vk = 0, 


so S is not linearly independent. But this is a contradiction, so our initial 
assumption that Sı spans V must be wrong. Thus S4 cannot span V and 
S is a minimal spanning set. 


If we have a linearly independent set of vectors that spans a vector space, 
then we give the set of vectors a special name. 


Definition 


A basis for a vector space V is a linearly independent set of vectors 
that is a spanning set for V. 


The plural of basis is bases. A basis of a vector space V is one set of 
linearly independent vectors that spans V; a basis is not unique, so V can 
have many different bases. 


You saw in Exercise C53(a) that {(1,1),(—1,2)} is a spanning set for R?. 
Since it is also a linearly independent set, it is a basis for R?. Although the 
set {(1,0), (1,1), (1, —2)} is also a spanning set for R?, it is not linearly 
independent, as we showed earlier in this section: so it is not a basis for R?. 
While each vector in R? can be written as a linear combination of vectors 
in the spanning set {(1,0), (1,1), (1, -2)}, this expression is not unique. 
For example, 

(0,1) = 2(1,0) — 1(1, 1) — 1(1, —2) 

= —4(1,0) + 3(1,1) + 1(1, —2). 

An important property of a basis for a vector space V is that each vector 
in V has a unique expression as a linear combination of basis vectors. 


Theorem C21 


Let S be a basis for a vector space V. Then each vector in V can be 
expressed as a linear combination of the vectors in S' in only one way. 


Proof Let S = {v1, v2,..., Vk} be a basis for a vector space V. 


@. We assume that a vector in V can be written as a linear combination 
of v1, V2,..., Vk in two different ways, and show that this leads to a 
contradiction. ©& 


Let u be a vector in V, and assume that we can write u as a linear 
combination of the vectors in S' in two different ways as: 


uU = Q{V] + AQV2 +++ + AVE 


and 

u = p1vı + Bove +++: + Breve. 
Then 

0 =u- u = (a1 — b1)vı + (a2 — p2)V2 +-+- + (ak — Be) VE, 
and (a1 — 81), (a2 — b2), ..., (ak — k) are not all zero. 


Therefore the set S is linearly dependent. But S is a basis for V, and is 
therefore linearly independent. This contradiction shows that 
Theorem C21 is true. E 


3 Bases and dimension 


145 


Unit C2 Vector spaces 


The definition of a basis gives us a strategy for testing whether a given set 
of vectors is a basis for a particular vector space. 


Strategy C8 


To determine whether a set of vectors S in a vector space V is a basis 
for V, check the following conditions. 


(1) S is linearly independent. 

(2) S spans V. 

If both (1) and (2) hold, then S is a basis for V. 

If either (1) or (2) does not hold, then S is not a basis for V. 


Worked Exercise C36 


Show that S = {(2,0,2), (1,1, 1), (0,1, —1)} is a basis for R3. 


Solution 
We check both conditions in Strategy C8. 
@®. We start by checking condition (1): S is linearly independent. © 
Using Strategy C7, we write 
a(2,0,2) + 6(1,1,1) + 7(0,1, —1) = (0,0, 0), 
which simplifies to 
(2a + 8,8+-7,2a+ 8 — y) = (0,0, 0). 


Equating corresponding coordinates, we obtain the system 


2a + 8 =) 
Pees) —=\l) 
2a B= y=), 


@®. We could use Gauss-Jordan elimination, but we can solve this 
system directly. .® 


Subtracting the third equation from the first gives y = 0, and 
substituting this into the second equation gives 6 = 0. Finally, 
substituting 8 = 0 into the first equation gives a = 0. The only 
solution isa = 0 = 7 = 

Therefore the set S' is linearly independent. 

@ We now check condition (2): S spans RÌ. @& 

We apply Strategy C6. 


®. We need to show that every vector in R? can be expressed as a 
linear combination of the vectors in S, so we show that the general 
vector (x,y,z) can be. & 


146 


3 Bases and dimension 


Each vector in R? can be written as (x,y,z), with z,y,z € R. To 
show that (x,y,z) is in (S), we write 


Cath z) = a(2,0, 2) Ir B0, ig 1) a JO, if =í), 


Equating corresponding coordinates, we obtain the system 


2a +8 =r 
B+y=y 
2a+ B-y=z. 


Subtracting the third equation from the first gives y = x — z, and 
substituting this into the second equation gives 8 = y — x + z. Finally, 
substituting for 8 in the first equation gives a = 5 (2a —y-—z). We 
have a solution, so any vector in R? can be written in terms of vectors 
in S as 


(x,y,z) =4(2x — y — z)(2,0, 2) + (y — £ + z)(1,1,1) 
+ (x — z)(0,1,—1). 
Therefore S spans R3. 
Since conditions (1) and (2) hold, the set S$ is a basis for R3. 


Worked Exercise C37 


Determine whether each of the following sets is a basis for R. 
(a) {(0,1,2),(1,2,-1)} (b) {(1,1,1), (0,2,1), (-1,1,0)} 


Solution 
(a) We check both conditions in Strategy C8. 


The set {(0, 1,2), (1,2, —1)} is linearly independent, as neither 
vector is a multiple of the other. 


We apply Strategy C6. 


®. We need to show that every vector in R? can be expressed as 
a linear combination of the given vectors, so we show that the 
general vector can be. © 


Each vector in R? can be written as (x,y,z), with x,y,z € R. To 
show that (x,y,z) is in ({(0,1, 2), (1,2, —1)}), we write 


(Gag; z) = a(0, il 2) T BU 2, =e 


Equating corresponding coordinates, we obtain the system 


oe 
a — 
2a—- B=z. 


147 


Unit C2 Vector spaces 


148 


Substituting 6 = x from the first equation into the other two 
equations gives 


Oe 

a= 3(r+2). 
®. The vector (x,y,z) is a general vector, so we need a solution 
for every possible combination of x, y and z. & 


These two equations are true simultaneously if and only if 
y— 2r = $(a + z); that is, if and only if 5a — 2y + z = 0. 


®. This is not true for every x, y and z. In fact, it shows that 
({(0, 1, 2), (1,2, -1)}) is the plane 52 — 2y+ z =0 in R°; thus any 
point not on this plane cannot be written as a linear combination 
of the vectors (0,1,2) and (1,2,—1). @ 


This contradicts the assumption that x, y and z can take any 
real values, so {(0, 1,2), (1,2,—1)} is not a spanning set for R®. 
Thus it is not a basis for R°. 


(b) We check both conditions in Strategy C8. 


®. Before diving into Strategy C7, we quickly look at the given 
vectors to see if there is any obvious linear dependence. £& 


Here we have 
(—1,1,0) = —(1,1,1) + (0, 2,1), 
so these vectors are not linearly independent. 


Therefore the set {(1, 1,1), (0, 2,1), (—1,1,0)} is not a basis 
for R. 


Exercise C59 


Determine whether each of the following sets is a basis for R3. 
(a) {(0, 1,2), (0, 2,3), (0,6, 1)} 
(b) qc 2, Dy (1, 0, —1), (0, 3, 1)} 
(c) {(1,0,0), (0,1,0), (0,0,1), (1,1, 1)} 
Exercise C60 


Determine whether {(1,2, —1, —1), (—1, 5, 1,3)} is a basis for R4. 


We now consider bases for vector spaces other than R?, R3 and R£. 


Worked Exercise C38 


Determine whether each of the following sets is a basis for P3. 
(a) {Las (b) {LaF (©) {1,2+2°, 27} 


Solution 


(a) 


We check both conditions in Strategy C8. 
@ We check whether {1, z, x7} is linearly independent. © 
Using Strategy C7, we write 

al + Bx + ya? = 0+ Or + 02”. 


Comparing coefficients, we have a = 8 = y = 0 as the only 
solution, so the set is linearly independent. 


@. We check whether {1, 2,27} spans Pz. © 
We apply Strategy C6. 


®. We need to show that every vector (polynomial) in P3 can be 
written as a linear combination of 1, z and x”, so we show that 
the general vector a + bz + cx? can be. & 


Each vector in Pz can be written as a+ bx + cx”, with a,b,c € R. 
To show that a + bx + cx? is in ({1,x,x?}), we write 


a+ bx + ca” = a(1) + B(x) + y(2”). 
Equating coefficients, we see that a= a, b= 8 andc=y. 
Therefore the set of vectors spans P3. 
Thus {1, x, 27} is a basis for P3. 


®. Notice that x? cannot be expressed as a linear combination of 
land z. # 


None of the vectors contains an g? term, so the set {1,2} does 
not span P3. 


Therefore this set of vectors is not a basis for P3. 


®. You may have noticed that neither vector is a multiple of the 
other, so the set {1,2} is linearly independent. The span of this 
set consists of polynomials of the form a + bx, which is a proper 
subset of P3. & 


Here we have 

2+ a7 = 2(1) + 1(2’), 
so the set {1,2 + x”, 27} is not linearly independent. 
Therefore {1,2 + 74, x7} is not a basis for P3. 


@. The span of this set consists of all polynomials of the form 
a + ba, which again is a proper subset of P3. ® 


3 Bases and dimension 


149 


Unit C2 Vector spaces 


Figure 9 An ellipse with 
non-standard basis shown 


150 


Xy 


Exercise C61 


Determine whether 


oe ea 


is a basis for M22. 


3.3 Standard bases 


You may have noticed that some sets of basis vectors seem to make the 
calculations in vector spaces particularly simple. For R? this set is 
{(1,0), (0, 1)}, for R? it is {(1, 0,0), (0, 1,0), (0,0, 1)}, and so on. 


The representation of a vector in terms of these bases is straightforward. 
For example, in R? 


(x,y) = x(1,0) TE y(0, 1); 
and in R? 
(x,y,z) = x(1,0,0) + y(0, 1,0) + 2(0, 0, 1). 


Because these bases are so simple, they are used frequently; they are called 
standard bases. 


Definition 
The standard basis for R” is the set of n vectors 


OO A O AOO E ORO 


The standard basis for R” seems so natural that you may wonder why we 
do not use it all the time. In some physical situations, however, we may 
need to choose a different basis. For example, if we are looking at an 
ellipse centred at the origin, we may want to choose basis vectors along the 
major and minor axes of the ellipse. For the ellipse shown in Figure 9, it 
may be more convenient to choose the basis vectors (1,1) and (—1, 1) 
rather than the standard ones, (1,0) and (0,1). Similarly, if we are 
considering a parallelogram, we may want to choose basis vectors along the 
sides of the parallelogram. In many vector spaces other than R” there are 
particularly simple bases, which we call the standard bases for these 
spaces. Here are some examples. 


Poe {1,2,2°,..., 2°71} 


oa: TO E 9} 


Ce {1,i} 


If we write a vector in R? as (x,y), then x and y are the components, or 
coordinates, of the vector with respect to the standard basis vectors — that 
is, 

(x,y) = x(1,0) + y(0, 1). 


However, we need some way of indicating what the coordinates of a vector 
are with respect to non-standard basis vectors. We use the following 
notation. 


Definitions 


Let E = {e1,€2,...,@n} be a basis for a vector space V, and suppose 
that 


Y = Ci e a ane, 

where v1, U2,..., Un E R. 

Then the E-coordinate representation of v is 
Wig = (v1, v2, ee , Un) B- 


We call vj, v2,...,Un the coordinates of v with respect to the 
basis E, or, more briefly, the E-coordinates of v. 


Remarks 


1. We usually omit the subscript if Æ is the standard basis. 


2. We write the basis vectors as {e1, €2,..., en} rather than 
{vi,V2,...,Vn} to avoid confusion between the basis vectors and the 
coordinates v1,V2,...,Un of a vector v. 

3. We can denote the E-coordinates of a vector vj by v1;,v2j,...,Unj- SO 


we write vj = v1je1 + V2je2 +--+ + Unjen. 


4. Since E is a basis for V, the E-coordinate representation of a vector 
in V is unique. However, the order of the coordinates in such a 
representation depends on the order of the basis vectors. 


5. A non-zero vector has a different coordinate representation for each 
different basis. For the zero vector, the coordinates are always zero. 


You can think of the different representations of a vector as analogous 
to an amount of money being expressed in different currencies; in every 
currency, ‘no money’ is the same as ‘zero money’. 


6. If E is a standard basis, then we refer to the standard coordinate 
representation, standard coordinates, and so on. 


The following worked exercise shows this notation in practice. 


3 Bases and dimension 


151 


Unit C2 Vector spaces 


152 


Worked Exercise C39 


Given the basis E = {(—1, 2), (2,2)} for R?, determine the standard 
coordinate representation of (3, 2)z. 


Exercise C62 
(a) Given the basis E = {(1, 2), (—3,1)} for R?, determine the standard 


coordinate representation of (2,1)z. 


(b) Given the basis E = {(1,0, 2), (—1, 1,3), (2, —2,0)} for RÌ, determine 
the standard coordinate representation of (1,1,—1)p. 


We can also turn around the method in Worked Exercise C39 to express a 
given vector in terms of a non-standard basis. 


Worked Exercise C40 


For each of the following bases F for R?, find the E-coordinate 
representation of the vector (1, 4). 


(a) a= {(1,4), (4, =1)} (b) E = {(=1,2), (2,2)} 


(b) We write (1,4) = a(—1,2) + (2,2) = (—a + 28, 2a + 28). 
Equating corresponding coordinates, we obtain the system 


et p= i 
2o A Dp = A 


Solving these equations gives œa = 1 and ĝ = 1, so 


(1,4) = 1(-1,2) + 1(2,2) = (1, De. 


Geometrically, by changing the basis we are changing the axes we are 
using. For example, in Worked Exercise C40(b) we are expressing the 
vector (1,4) (with respect to the standard basis) as a vector in terms of 
the new basis vectors E = {(—1, 2), (2,2)}. The E-coordinates of this 
vector with respect to the basis E are (1,1) representing one step along 
the (—1,2)-axis then one step along the (2, 2)-axis. Figure 10 illustrates 
how this vector is represented with respect to these new axes. 


Worked Exercise C41 


Find the £-coordinate representation of the vector (—2,0,1) with respect 


to the basis Æ = {(1,0,0), (1,0, 1), (2,1, —1)} for R3. 


Solution 


We write 
(—2,0,1) = a(1, 0,0) + B(1,0, 1) + ¥(2, 1, -1) 
Equating corresponding coordinates, we obtain the system 
a+ B+ 2y = —2 
y=0 
p= qļ=1. 
The second equation gives y = 0. Substituting this value into the 
third equation gives 6 = 1, and substituting these values into the first 
equation gives a = —3. So 
(—2, 0,1) = —3(1, 0,0) + 1(1, 0, 1) + 0(2, 1, -1) 
= (=3, We 0)g. 


3 Bases and dimension 


Figure 10 Changing the axes 


153 


Unit C2 Vector spaces 


154 


Exercise C63 


(a) Find the £-coordinate representation of the vector (5, —4) with 
respect to the basis E = {(1,2), (—3,1)} for R?. 


(b) Find the E-coordinate representation of the vector (—3,5,7) with 
respect to the basis E = {(1,0, 2), (—1, 1,3), (2, —2, 0)} for R3. 


3.4 Dimension 


You may have noticed in the previous subsection that all the bases you met 
for R? contained two vectors, all the bases for R? contained three vectors, 
and so on. This should correspond to your intuitive idea of dimension — 
namely that R is one-dimensional, R? is two-dimensional, and so on. 


For example, among the bases you met were the following. 
{00,0}, (1,0,4GDh LA 
R3: {(1,0,0), (0,1,0), (0,0,1)}, {(1,2,1),(1,0,—1), (0,3,1)}. 
Rt : {(1,0, 2, 0), (0, 1,0,3), (0,0, 1, 2), (2,0, —1,0)}, 
{(1,0,0,0), (0,1,0,0), (0,0, 1,0), (0,0,0, 1)}. 


It is not a coincidence that every basis for R? contains exactly two vectors, 
and every basis for R contains exactly three vectors. The main theorem in 
this section, the Basis Theorem, states that if V is any vector space, then 
every basis for V contains the same number of vectors. Before we prove 
this, we must define what we mean by a finite-dimensional vector space. 


Definitions 


Let V be a vector space. Then V is finite-dimensional if it contains 
a finite set of vectors S that forms a basis for V. If no such set exists, 
then V is infinite-dimensional. 


Examples of infinite-dimensional vector spaces are R® and the set of 
polynomials of any degree. On the other hand, the set containing just the 
zero vector is a zero-dimensional vector space, which has the empty set as 
its basis. 


In order to prove that every basis for a finite-dimensional vector space V 
contains the same number of vectors, we first prove the following useful 
result. 


Theorem C22 


Let E = {e1,€2,...,€n} be a basis for a vector space V, and let 
S = {vi,v2,...,Vm} be a set of m vectors in V, where m > n. Then 
S is a linearly dependent set. 


Proof ®. We assume that the conditions of Theorem C22 hold and show 
that this implies that S is linearly dependent. .© 

Let E = {e1,€2,...,€n} be a basis for V, and let S = {vi,v2,...,Vm} be 
a set of m vectors in V. Then each of the vectors v1, Vv2,...,Vm can be 
written as a linear combination of the vectors in E; that is, 


Vi = V11€1 + V21€2 + +++ + Unien, 


V2 = V12€1 + V22€2 + +++ + Un2en, 


Vm = Vime1 + Vome + +++ + Unmen; 
for some numbers v14,..--,Unm E R. 
To show that S' is linearly dependent, we must find real numbers 
Q1,02,...,@m, not all zero, such that 

Q1V1 + Q2V2 +--+ AmVm = 0. (1) 
Using the first system of equations, we can rewrite equation (1) as 

(aq 011 + agv + +++ + AmVim)e1 


+ (av21 + a2v22 +++: + AmVem)e2 


++ (Q1Un1 + a2Un2 t =t AmUnm)en = 0. (2) 
Since F is a basis, the set of vectors {e1, €2,..., €n} is linearly 
independent. It follows that we can find real numbers aj, Q2,...,Qm, not 
all zero, that satisfy equation (2) if and only if the following system of 
equations has a non-zero solution for a1, @2,...,Qm? 

U1101 + U12A2q + +++ + VimAm = 0 

vqiay + vg2a2 + +++ + VamAm = 0 

Un1Q1 + Un2A2 + +++ + Unm%m =Q. 


This is a system of n linear equations in m unknowns with m > n, so there 
are more unknowns than equations. 


®. In Unit C1 you saw that a consistent system with more unknowns than 
equations has an infinite solution set. The system above is consistent 
because it is homogeneous, and therefore it has an infinite solution set. .©& 


Such a system of linear equations has a non-trivial solution — that is, 

a solution for which some variables are non-zero. Therefore the 

set S containing m > n vectors is linearly dependent. This proves the 
theorem. B 


3 Bases and dimension 


155 


Unit C2 Vector spaces 


156 


For example, R? has three vectors in its standard basis, so, by 
Theorem C22, the set 


{(1, I, 0), (0, =2; 1), (0, 0, 1), (1, 1, 2)} 
is linearly dependent because it contains more than three vectors. In fact, 
(1, L 0) + 0(0, —2, 1) T 2(0, 0, 1) _ (1, 1, 2) _ (0, 0, 0). 


Theorem C22 has the following immediate, and useful, consequence. 


Corollary C23 


Let V be a vector space with a basis containing n vectors. If a linearly 
independent subset of V contains m vectors, then m < n. 


This corollary provides the crucial steps in the proof of the Basis Theorem. 


Theorem C24 Basis Theorem 


Let V be a finite-dimensional vector space. Then every basis for V 
contains the same number of vectors. 


Proof ®. We assume there are two bases with n and m vectors, 
respectively, and show that since a basis is a linearly independent set, this 
implies that n =m. & 


Let {e),e2,...,en} and {fj, f2, ..., fm} be two bases for a 
finite-dimensional vector space V. 


Since {e1,e2,...,@n} is a basis for V and {f1, fo,..., fm} is a linearly 
independent set, we have m < n, by Corollary C23. 


Similarly, since {f,, fo,...,fm} is a basis for V and {e1,e2,...,e,} is 
linearly independent, we have n < m, by Corollary C23. 


Therefore m = n, so every basis contains the same number of vectors. E 


The Basis Theorem allows us to give a definition of the dimension of a 
finite-dimensional vector space, which agrees with our intuitive idea of 
dimension. 


Definition 


The dimension of a finite-dimensional vector space V, denoted by 
dim V, is the number of vectors in any basis for the space. 


So R? has dimension 2 and R? has dimension 3, as we would expect. More 
generally, R” has dimension n, since the standard basis for R” has n 
vectors. It follows from Theorem C24 that every basis for R” contains 
exactly n vectors. The strategy for checking whether a set of vectors is a 
basis (Strategy C8) can now be greatly simplified when the vector space is 
R”. The result that we need is stated in the next theorem. 


Theorem C25 


Let V be an n-dimensional vector space. Then any set of n linearly 
independent vectors in V is a basis for V. 


Proof ®. We give a proof by contradiction. & 


Suppose that the set S = {v1,v2,...,Vn} of n linearly independent vectors 
does not span V. Then there exists a vector v in V that cannot be written 
as a linear combination of the vectors in S. 


So, if 
Vi tes + QnVn + Antiv = 0, 
then an+1 = 0, since v cannot be written as a linear combination of the 


vectors in S and a; =--:: =a, = 0, since S is linearly independent. Hence 
{vi,V2,---,Vn, V} is a linearly independent set of vectors. 


But by Theorem C22, any set of more than n vectors is linearly dependent. 
This is a contradiction so the original statement must be false, and S' does 
span V. 


Therefore every set of n linearly independent vectors in V is a basis 
for V. E 


This means that to check whether a set S is a basis for R”, we no longer 
have to check that S spans R”: we know that it does if it is linearly 
independent and contains n vectors. We can simplify Strategy C8. 


In fact, we can use this simplified strategy to determine whether a set of 
vectors is a basis for any vector space V if we know the dimension of V. 


Strategy C9 


To determine whether a set of vectors S in a vector space V of 
dimension n is a basis, check the following conditions. 


(1) S contains n vectors. 
(2) S is linearly independent. 


If both (1) and (2) hold, then S is a basis for V. 
If either (1) or (2) does not hold, then S is not a basis for V. 


3 Bases and dimension 


157 


Unit C2 Vector spaces 


158 


Exercise C64 


Use Strategy C9 to determine which of the following sets is a basis for R?. 
(a) {(1,2,1),(1,0,-1)} Œ) {(1,0,1), (1,0, -1), (0,1, 1)} 

(c) {(1,—1, 0), (2, 1, 4), (3,0, 4)} 

(da) {(1, 0,0), (0, 1,0), (0,0, 1), (1,1, 1) 


Strategy C9 is easier to use than Strategy C8 because you can eliminate 
sets that do not contain the right number of vectors. Furthermore, you do 
not need to check spanning, which is usually harder than checking for 
linear independence. 


To be able to apply Strategy C9 to vector spaces other than R” we need to 
know the dimension of other vector spaces. 


In Subsection 3.3 we listed the standard bases for some vector spaces as 
follows. 


R”: 4{(1,0,...,0),(0,1,0,...,0),...,(0,...,0,1)}. 
Pa: teat ce ot 


w {9.09.0 9.6 9} 


C: {13}. 
We can see that the dimension of P, is n, so the dimension of P is 2, the 
dimension of P3 is 3, and so on. 


Similarly, the dimension of Mp is 4, and, in general, the dimension of 
Mm,n is mn. For example, M23 has dimension 6: a basis is 


1 0 0 0 1 0 0 0 1 

0 0 0/’\0 0 0/°\0 0 OF’ 

0 0 0 0 0 0 0 0 0 

1 0 0/°\0 10/710 0 I) ]- 
Finally, the dimension of C is 2. 


Exercise C65 


Use Strategy C9 to determine whether each of the following sets is a basis 
for the given vector space. 


(a) The set S for Mo 2, where 


CICIEL) 


(b) The set S = {2 + x,1 — z} for Py. 


We end this section by showing that a linearly independent subset of a 
vector space can always be extended to give a basis for the vector space. 
This result will be useful in Unit C3 Linear transformations. 


Theorem C26 


Let S = {v1, V2,...,Vm} be a linearly independent subset of an 
n-dimensional vector space V, where m < n. Then there exist vectors 
Vilo eop Vp i V such that i Vagana Vah 1s 4 basis tor V 


Proof Since m < n, S is not a basis for V, by the Basis Theorem 
(Theorem C24) and Theorem C25. Thus there is a vector Vm+1 in V that 
cannot be expressed as a linear combination of the vectors in S. As in the 
proof of Theorem C25, it follows that {v1, v2,...,Vm4+1} is linearly 
independent. 


We keep adding vectors in this way until we obtain a linearly independent 
set with n vectors. This is a basis, by Theorem C25. | 


4 Subspaces 


In this section you will meet subsets of vector spaces that are themselves 
vector spaces. 


4.1 Definition 


You have seen examples where a set of vectors does not span the whole of 
a vector space, but spans only a proper subset of that vector space, for 
example in Worked Exercise C32 and Exercise C56. In particular, you saw 
the following. 


e In RÊ, the set of vectors {(1,1)} is a spanning set for the line through 
the origin with equation y = x; this is a one-dimensional subset of R?. 


e In R, the set of vectors {(1,0,0)} is a spanning set for the x-axis; this is 
a one-dimensional subset of R3. 


e In RÌ, the set of vectors {(1, 0,1), (2,0,3)} is a spanning set for the 
plane y = 0; this is a two-dimensional subset of R3. 


In fact, any proper subset of R that is the span of a set of vectors must 
take one of the following forms: {0}, a line through the origin (a 
one-dimensional subset), or a plane through the origin (a two-dimensional 
subset). 


When you met these examples, you may have asked yourself whether these 
subsets are themselves vector spaces. In fact, they are; we call such subsets 
subspaces. 


4 Subspaces 


159 


Unit C2 Vector spaces 


Definition 


A subset S of a vector space V is a subspace of V if S is itself a 


vector space under vector addition and scalar multiplication as 
defined in V. 


In order to prove that a subset S is a vector space, we must show that it 
satisfies all the axioms in Subsection 1.2. In practice, however, we do not 
need to check them all, as many of them carry over from V; that is, if they 
are true for V, then they are also true for S. For example, the 
commutativity axiom (A5) states that vı + v2 = v2 + vı, for all vı, v2 E€ V; 
since all the vectors in S' are also in V, this axiom holds for S. 


Provided that S is non-empty, the only axioms that need to be checked are 
the closure axioms (A1 and S1), because all the other axioms follow 

from V. If the zero vector is in S, then S is non-empty. Therefore we can 
replace the condition that S is non-empty by the condition that the zero 
vector is in S. This gives the following theorem; you are asked to prove 
this as an exercise in the additional exercises booklet for this unit. 


Theorem C27 


A subset S of a vector space V is a subspace of V if it satisfies the 
following conditions. 


(a) OES. 
(b) S is closed under vector addition. 


(c) S is closed under scalar multiplication. 


This theorem allows us to give a strategy for testing whether a given 
subset of a vector space is a subspace. 


Strategy C10 


To test whether a given subset S of a vector space V is a subspace 
of V, check the following conditions. 


(1) 

(2) If vi, v2 € S, then vı + v2 € S (vector addition). 

(3) If v € S and a E R, then av € S (scalar multiplication). 

If (1), (2) and (3) hold, then S is a subspace of V. 

If any of (1), (2) or (3) does not hold, then S$ is not a subspace of V. 


0 € S (zero vector). 


The following worked exercises and exercises illustrate how this strategy is 
used to show that a given set is a subspace. 


160 


4 Subspaces 


Worked Exercise C42 


Show that the set of vectors S = {(x, 3x) : x € R} is a subspace of R?. 
Sketch this subspace. 


Solution 
The set S is a subset of R?, so we use Strategy C10. 
®. We first check condition (1): 0 € S. #& 
If x = 0, then (x, 3x) = (0,0), so S contains the zero vector of R?. 
@. We check condition (2): If v1, v2 E€ S, then vi + vz E€ S. @& 
Let vı = (21,321) and v2 = (x2, 3x2) belong to S. Then 
Vi + v2 = (21,321) + (2, 322) 
= (41 + £2, 321 + 322) 
= (£1 + £2, 3(x1 + 22)). 


This vector has the correct form for a vector in S, since 71 + x2 E€ R, 
so S is closed under vector addition. 


®. We check condition (3): If v € S and a € R, then av € S. & 
Let v = (x,3x) € S and a € R. Then 


aN = olam 50) = lanne) = (an, la) 


This vector has the correct form for a vector in S, since ax € R, so S 
is closed under scalar multiplication. 


Since conditions (1), (2) and (3) are satisfied, S is a subspace of R?. 
This subspace is the line through the origin with equation y = 3x. 


Exercise C66 


Show that the set of vectors S = {(x,—2x) : x € R} is a subspace of R?. 


161 


Unit C2 Vector spaces 


Worked Exercise C43 


Show that the set of vectors S = {(x, y, 2x — 3y) : x,y € R} is a subspace 
of R3. 


Solution 
The set S is a subset of R?, so we use Strategy C10. 
If x = y = 0, then (z, y, 2x — 3y) = (0,0,0), so S contains the zero 
vector of R3. 
Let vi = (£1, Y1, 2%1 — 3y1) and ve = (#2, Y2, 222 — 3y2) belong to S. 
Then 
Vi + V2 = (21,91, 241 — 341) + (T2, yo, 222 — 3ya) 
= (x1 + 2, y1 + Y2, 221 — 3y1 + 2x2 — 3y2) 
= (x1 + 2, y1 + Yo, 2(@1 + 22) — 3(y1 + y2)). 
This vector has the correct form for a vector in S, since 
zı + z2, Y1 + yo E€ R, so S is closed under vector addition. 
Let v = (2, y, 2x — 3y) € S and a € R. Then 
av = a(z, y, 2x — 3y) 
= (az, ay, a(2x — 3y)) 
= (ax, ay, 2(ax) i 3(ay)). 
This vector has the correct form for a vector in S, since az,ay E€ R, 
so S is closed under scalar multiplication. 


Since conditions (1), (2) and (3) are satisfied, S is a subspace of R3. 


®. S is the set of points in RÌ satisfying z = 2x — 3y; it is the plane 
through the origin with equation 2x — 3y — z = 0. & 


Strategy C10 is used in much the same way to determine whether a given 
subset is a subspace. However, since if any one of the conditions fails then 
the subset is not a subspace, it may be that only one of the conditions 
needs to be checked. 


Worked Exercise C44 


For each of the following, determine whether the set S is a subspace of the 
vector space R°. 


(a) S={(@,y,a-—y+2):a,yEeR} (b) S={(z—-y,y,2):y,2 E€ R} 


162 


4 Subspaces 


Solution 
In each case the set S is a subset of R3, so we use Strategy C10. 


(a) IfO €S, then (x, y,x — y+ 2) = (0,0,0) for some numbers x and 
y. Equating corresponding coordinates, we obtain the system 
ae =0 
y= 
w= y= = 
This system is inconsistent so has no solution. Therefore 0 does 


not belong to S and condition (1) is not satisfied. Hence S is not 
a subspace of R°. 


®. Since condition (1) is not satisfied, we do not need to check 
conditions (2) and (3). However, neither is satisfied, and either 
one could have been used to show that S is not a subspace. .© 
(b) If y= z = 0, then (z — y,y, z) = (0,0,0), so S contains the zero 
vector of R3. 
Let vi = (21 — y1, Y1, 21) and v2 = (z2 — y2, Y2, 22) belong to S. 
Then 
Vi + V2 = (21 — Y1, Y1, 21) + (22 — Y2, Y2, 22) 
= (z1 — yi + 22 — Y2, Y1 + Yo, 21 + 22) 
= (e + 22) = (yi + ve) y1 + yo, 21 + 22). 
This vector has the correct form for a vector in S, since 
Yı + Y2, 21 + z2 E€ R, so S is closed under vector addition. 
Let v = (z — y,y,z) E€ S and a € R. Then 
aV = a(z a Y,U>2) 
a (a(z T u) Qay, az) 
= (az — ay, ay, az). 
This vector has the correct form for a vector in S, since 
ay,az € R, so S is closed under scalar multiplication. 


Since conditions (1), (2) and (3) are satisfied, S is a subspace 
of R°. 


@. S is the set of points in RÌ satisfying z = x + y; it is the plane 
through the origin with equation z + y — z = 0. & 


Exercise C67 
For each of the following, determine whether the set S is a subspace of the 
vector space V. 
(a) V=R’, S={(2,2+2):c2€R}. 
(b) V=R‘*, S={(z,y,z,0 + 2y—z):2,y,z E R}. 


163 


Unit C2 Vector spaces 


Worked Exercise C45 


Determine whether the set S = {a cos x : a € R} is a subspace of the vector 
space V = {a cosx + bsin x : a,b € R}. 


(We showed that V is a vector space in Subsection 1.2.) 


Exercise C68 


For each of the following, determine whether the set S' is a subspace of the 
vector space V. 


(a) V=P3, S={a+bzr:a,b€ R}. 
(b) V=P3, S={x+ax?:a€R}. 


(c) V= Mop, s={(6 I) oder} 


The following theorem shows that the span of a subset of a vector space is 
always a subspace. 


Theorem C28 


Let S be a non-empty finite subset of a vector space V. Then (S) is a 
subspace of V. 


164 


4 Subspaces 


Proof Let S = {uj,ue,...,u,} be a non-empty finite subset of a vector 
space V. Then the set (S) is a subset of V since V is closed under vector 
addition and scalar multiplication. 

@. We apply Strategy C10. @ 


The span (S) contains the zero vector, since Ou; + Ouz +---+0u, = 0 
belongs to (9). 

Let vı = a, uy + adgug +--+ + anUn and vz = biu] + b2U2 +---+b,u, be 
any two vectors in (S). Then 


Vi + V2 = (a, uy + agug +--+ + anUn) + (bru, + b2U2 +--+ + bnUn ) 
= (a1 + b1)u1 + (a2 + b2)u2 +--+ + (an + bn)Un. 
This is a member of (S), since it is a linear combination of u1, Ug,..., Un. 


Hence (S) is closed under vector addition. 
Let v = au; + a2U2 + : +- + anUn and a € R. Then 
av = a (au; + a2U2 +--+: + anun) 
= (aaı)uı + (aaz)uz2 +--+ + (aan)un. 


This is a member of {S}, since it is a linear combination of uj, u2, ... , Un. 
Hence (S) is closed under scalar multiplication. 


Thus (S) is a subspace of V. El 


4.2 Bases and dimension 


In the previous subsection you saw several subspaces of finite-dimensional 
vector spaces. Since these subspaces are all vector spaces in their own 
right, they have bases and dimensions, and we look at these in this 
subsection. 


Let us return to two of our earlier examples from Section 2: Worked 
Exercises C32(a) and (b). (a, a) 


By Theorem C28, we now know that the set of vectors in R? spanned by 
the set S = {(1,1)} is a subspace of R?. In Worked Exercise C32(a) we (1,1) 
saw that any vector in this subspace (S) can be written in the form (a, a) 
for some a € R; so the set {(1,1)} is a basis for this subspace. Thus the 
dimension of the subspace is 1. This agrees with our intuitive idea of Figure 11 The 
dimension: we saw that these vectors form a line through the origin — the one-dimensional subspace 
line y = x, as shown in Figure 11 — which is one-dimensional. ({(1, 1)}) 


Sv 


Similarly, from Worked Exercise C32(b) the set of vectors in R3 spanned 
by the set S = {(1,0, 1), (2,0,3)} is a subspace of RÌ. This subspace (S) 
consists of those points of R? of the form (a,0,z). Since the set 

{(1,0, 1), (2,0,3)} spans the subspace and is linearly independent (the 
vectors are not multiples of each other), it is a basis for this subspace. 
Since there are two vectors in the basis, the dimension of the subspace is 2. 


165 


Unit C2 Vector spaces 


Again, this links the idea of dimension in linear algebra to our intuitive 
idea of dimension: we saw that the subspace spanned by these two vectors 
(2, 0,3) is a plane through the origin — namely, the plane y = 0, as shown in 
Figure 12 — which is two-dimensional. Since any vector in the subspace can 
be written in the form (z,0,z), we can find another basis for this subspace 
eC by writing 


(x, 0,z) = #(1,0,0) + 2(0,0, 1). 
This means that the set {(1,0,0),(0,0,1)} is another spanning set for the 


cy 


7 subspace and, as it is also linearly independent, it is a basis for the 
Figure 12 The subspace. This basis has the additional advantage that it is orthogonal, 
two-dimensional subspace which means that the basis vectors are at right angles to each other. We 
({(1, 0, 1), (2,0, 3)}) will return to orthogonal bases in Section 5. 


In the following worked exercises and exercises we consider various 
subspaces of R? and R4 and look at their bases and dimension. 


Worked Exercise C46 


Find the equation of the subspace of R? spanned by the set 
{(1, 0,2), (2,3, 4)}. 


Solution 


®. The two vectors are not multiples of each other, so they are 
linearly independent. © 


Since {(1, 0,2), (2,3,4)} is a linearly independent set, the subspace it 
spans is a two-dimensional subspace of R? (by Theorem C25). 


@. A two-dimensional subspace is a plane, and since the zero vector is 
in the subspace this plane must pass through the origin. © 


The subspace is therefore a plane through the origin with equation 
ax + by + cz = 0, 
where a, b, c are not all zero. 


Since the vectors in the spanning set lie in the plane, the values of a, b 
and c must satisfy the system 


a +2c=0 
2a + 3b + 4c = 0. 


The first of these equations gives a = —2c, and substituting this into 
the second equation gives b = 0; so the subspace is the plane with 
equation —2cx + cz = 0, or, equivalently, 


ip = 2 = 


166 


Exercise C69 


Find the equation of the subspace of R? spanned by the set 
{(, —2, 0), (0, 3, 3)}- 


Worked Exercise C47 


Find a basis for the subspace S = {(z—y,y,z) : y,z € R} of R3, and hence 
write down the dimension of S. 


(You showed that S' is a subspace of R? in Worked Exercise C44(b).) 


Exercise C70 


Find a basis for the subspace 
S = {(x,y,z,£ + 2y — 2) : x,y,z E R} 
of Rt, and hence write down the dimension of S. 
(You showed that S is a subspace of R4 in Exercise C67(b).) 


4 Subspaces 


167 


Unit C2 Vector spaces 


168 


Worked Exercise C48 


Find a basis for the plane x — 3y + 2z = 0 (a subspace of R?). 


The following result, which will be used in Unit C3, has been illustrated by 
the worked exercises and exercises in this subsection. For example, in 
Worked Exercise C47 we had V = R®, so dim V = 3 and 

dim S =2<dimV. 


Theorem C29 
The dimension of a subspace of a vector space V is less than or equal 


to the dimension of V. 


Proof Let V be a vector space of dimension n, and let S be a subspace 


of V. Suppose that the dimension of S is m, and let {e1,e2,...,@m} be a 
basis for S. Then {e1,e2,...,@m} is a linearly independent set of vectors 
in V. Thus m < n by Corollary C23. E 


5 Orthogonal bases 


In this section you will look at bases in which the basis vectors are all 
orthogonal to each other. 


5.1 Orthogonal bases in R? 


Suppose that we wish to express the vector (10,0, 4) in R® in terms of the 
basis 


{(2, 1,1), (1, —4, 2), (—2, 1, 3)}. 
Using the method given in Subsection 2.1, we first write 


(10, 0,4) = a1 (2,1,1) + a2(1, —4, 2) + a3(—2, 1,3). 


Equating corresponding coordinates gives the system 
2a, + ag — 2a3 = 10 
ay, — 4a + a3 = 0 
a, + 2a2 + 3Q3 = 4. 


We can solve this system using Gauss—Jordan elimination or directly, to 
obtain the solution 


Qı =4, az =Ë, a3 = —$. 
Thus 
(10,0, 4) = 4(2,1,1) + Ê(1, —4, 2) — 4(—2, 1,3). 


In this section you will see that there is a simpler method than this that 
involves scalar products of vectors. It can be used when, as here, the given 
basis is an orthogonal basis. In this subsection we concentrate on R3. 


We start by recalling from Unit A1 the definition of the scalar product in 
R3, and then use this to define the term orthogonal. 


Definitions 
Let vi = (1, y1, 21) and v2 = (2, Y2, z2) be vectors in R3. 
The scalar product of vı and v2 is the real number 

Vi o V2 = Piy + Y1Y2 + 2122. 


The vectors vı and v2 in R? are orthogonal if vı - v2 = 0. 


For example, the vectors vı = (2,1,1) and v2 = (—2, 1,3) are orthogonal, 
since 


vi V2 =2 x (-2)4+1k14+1x3=-441+3=0. 


Geometrically, this means that the vectors vı and vo are at right angles to 
each other, as shown in Figure 13. 


Exercise C71 


(a) Show that (2,1,1) and (1,—4,2) are orthogonal. 
(b) Determine which pairs of the following vectors are orthogonal: 


vı = (—2,6,1), v2 = (9,2,6), v3 = (4,—15,—1). 


Definition 


A set of vectors in R? is an orthogonal set if every pair of distinct 
vectors in the set is orthogonal. 


5 Orthogonal bases 


Figure 13 The orthogonal 
vectors vı = (2,1,1) and 
V2 = (-2, I; 3) 


169 


Unit C2 Vector spaces 


170 


For example, {v1, v2} is an orthogonal set if vı - v2 = 0; we have therefore 
shown above that {(2,1,1), (1, —4,2)} is an orthogonal set. 


Similarly, {vi, v2, v3} is an orthogonal set if 
Vie V2 = V1 + V3 = V2 ° v3 = 0. 
So {(2,1,1), (1, —4, 2), (—2,1,3)} is an orthogonal set since 
(1,-4.0) =(2,1,3) =-9 446 =0, 
and we have shown that (2,1,1) - (1,—4,2) = 0 and (2,1,1) - (—2,1,3) = 0. 


One of the most useful features of orthogonal sets of non-zero vectors is 
their linear independence. The following proof is for sets of three non-zero 
vectors, but a similar proof applies to other numbers of vectors and indeed 
to orthogonal sets of vectors in R”. 


Theorem C30 


Let {v1, v2, v3} be an orthogonal set of non-zero vectors in R3. Then 
V1, V2 and v3 are linearly independent. 


Proof ®. To show that v1, v2 and v3 are linearly independent we need 
to deduce that if a1vı + a2v2 + a3v3 = 0 then a; = a2 = a3 = 0 by using 
the properties of scalar products. .@ 


Suppose that 
a 1V1 + @2V2 + agv3 = 0. 

We form the scalar product on both sides of the equation with vy: 
vı + (avı + Q2Vv2 + a3v3) = vı +0 =0. 

Using the multiples property of the scalar product (Unit A1) we get 
aı(vı + vı) + a2(vi v2) + a3(vı v3) = 0. 


Since {v1, V2, v3} is an orthogonal set of non-zero vectors in RÌ, we know 
that 


vı- vı £0, vi- ve =0, vi- v3 =0, 
so we have aj(v,- v1) = 0 and thus a; = 0. 
Similarly, we form the scalar product with v2 and v3: 
v2- (ivi + a2V2 + a3V3) = v2 +0 = 0, 
which gives ag = 0; 
v3 + (a1v1 + Q2Vv2 + a3V3) = v3 -0 = 0, 
which gives a3 = 0. 
We conclude that if a1vı + agv2 + a3Vv3 = 0 then ay = ag = a3 = 0. 


Thus {v1, V2, v3} is a linearly independent set. | 


This result leads to the idea of an orthogonal basis. 


You have seen that any linearly independent set of three vectors in R? is a 
basis for R?. Now, if we have an orthogonal set of three non-zero vectors in 
R, then we know from Theorem C30 that the set is linearly independent, 
so the set is a basis for R. We call an orthogonal set that is a basis an 
orthogonal basis. 


Theorem C31 


Any orthogonal set of three non-zero vectors in R? is an orthogonal 
basis for R3. 


For example, the standard basis {(1,0,0), (0, 1,0), (0,0, 1)} for R? is an 
orthogonal basis, because these three basis vectors form an orthogonal set. 
Similarly, the triple of vectors below is an orthogonal basis for R? since the 
vectors are orthogonal (as we saw above), there are three of them, and 
they are all non-zero: 


{(2, 1,1), (1, —4, 2), (—2,1,3)}. 


One reason that orthogonal bases are so important is that it is usually 
much easier to express a vector in terms of an orthogonal basis than in 
terms of a general basis. At the beginning of this subsection we expressed 
(10,0, 4) in terms of the orthogonal basis {(2, 1,1), (1, —4, 2), (—2,1,3)} by 
writing 

(10, 0,4) = a1 (2, 1,1) + a2(1, —4, 2) + a3(—2, 1,3) (3) 
and solving the resulting system of linear equations. 
However, there is a quicker way of solving equation (3) because the basis is 
an orthogonal basis. We take the scalar product of the vector (10, 0, 4) 
expressed as in equation (3) with each basis vector in turn, making use of 
the fact that the scalar product of orthogonal vectors is zero. 
First with (2,1, 1): 

(2,1,1) - (10,0, 4) = a1 (2,1,1) - (2,1,1) + a2(2, 1, 1) - (1, —4, 2) 

a a3 (2, 1; 1) 7 (=2, 1, 3) 
= œ (2,1,1) + (2,1,1) +0 +0. 
The equation above gives 
(2,1,1)-(10,0,4) 24 
sAn l, 
(2,1,1): (2,1,1) 6 

Similarly, taking the scalar product with (1, —4, 2): 

(1, —4, 2) - (10, 0,4) = 0 + a2(1, —4, 2) - (1, —4, 2) + 0. 


Thus 


(1,-4,2)-(10,0,4) 18 6 
a2 = = _ Z _ — Z =, 
(1,4,2) - (1,—4,2) 21 7 


5 Orthogonal bases 


171 


Unit C2 Vector spaces 


Finally, taking the scalar product with (—2, 1,3): 
(24-4) <10.0,4) = 0 20 as(-9 1,3) 4 (01,3). 


Thus 
(—2,1,3) - (10,0, 4) 8 4 


“SDL a T 
Therefore, we have a, = 4, ag = - and a3 = —4, so 


(10, 0,4) = 4(2, 1,1) + $(1,—4, 2) — 3(—2, 1,3). 


This procedure works for orthogonal bases in general in R3 and is 
summarised in the following strategy. 


Strategy C11 
To express a vector u in R? in terms of an orthogonal basis 


{V1, V2, v3}: 

viet Vos u v3°U 
1. calculate ay = h= and @3 = 

Wil? Wil V2 OND Wh ON73} 


2. write u = ayvy + Q2V2 + A3V3. 


Exercise C72 


(a) Verify that {(3, 4,0), (8, —6,0),(0,0,5)} is an orthogonal basis for R3. 
(b) Express the vector (10,0,4) in terms of this basis. 


5.2 Orthogonal bases in R” 


In this subsection we see how the definitions and results of the previous 
subsection can be generalised to R”, for any positive integer n. We start 
with the definition of the scalar product of vectors. 


Definition 
Let v = (v1, V2,---,Un) and w = (wy, W2,..., Wn) be vectors in R”. 


The scalar product of v and w is the real number 


WY ONY = OW) ap wo ar o2° ap Walia. 


For example, in R° the scalar product of the vectors v = (1,2,3,4,5) and 
w = (3, —4, 0,3, —2) is 
vew=1x3+2~x (-4)+3x04+4x3+5 x (2) 
=3-8+0+412-10=-3. 


172 


Exercise C73 


Calculate the following scalar products. 
(a) (1,2,—1,0) - (0, —5,6,—3) in R4. 
(b) (1,2, 3, 4,5, 6) » (3,2,1,0,—1, —2) in R®. 


We now see how the ideas of an orthogonal set and an orthogonal basis 
extend to R”. 


Definitions 
The vectors v and w in R” are orthogonal if v- w = 0. 


A set of vectors in R” is an orthogonal set if every pair of distinct 
vectors in the set is orthogonal. 


An orthogonal basis for R” is an orthogonal set that is a basis 
for R”. 


For example, in R® the set 
4 (i, dy dd 1), (2,—2, 2, —2,2,—2),(5,5,0,0; —5,—5)} 
is an orthogonal set, since 
(1,1,1,1,1,1) - (2, —2, 2, —2, 2, —2) 
=2-—-24+2-2+4+2-2=0, 
(1,1,1,1,1,1) - (5,5,0,0, —5, —5) 
=5+5+0+0-—5—-5=0 


and 
(2, =2; 2, =2; 2, —2) : (5, 5, 0, 0, =9; —5) 
= 10—10+0+0-—-10+10=0. 


Exercise C74 


Show that the set {(1,0,0,0,0), (0,2,0,0,0), (0,0,1,1,0)} is an orthogonal 
set in R. 


Note that the standard basis 
{1,0,...,0), (0,1,0,...,0),..., (0,...,0,1)} 


is an orthogonal basis for R”. 


5 Orthogonal bases 


173 


Unit C2 Vector spaces 


174 


In Subsection 5.1 you saw that any orthogonal set of three non-zero 
vectors in R? is linearly independent and therefore forms an orthogonal 
basis for RÌ. Exactly the same methods can be used to prove the following 
more general result. 


Theorem C32 


Let S = {v1, v2,..., Vk} be an orthogonal set of non-zero vectors in 
R”. Then S is a linearly independent set. 


Since any set of n linearly independent vectors in R” forms a basis for R”, 
we obtain the following corollary to Theorem C32. 


Corollary C33 


Any orthogonal set of n non-zero vectors in R” is an orthogonal basis 
for R”. 


Exercise C75 


Show that 
{(, 2, 1, 0), (=1; l; =I, 1), (1, 0, =I, 0), (1; =l, E 3)} 


is an orthogonal basis for R4. 


Expressing vectors in terms of orthogonal bases 


Given an orthogonal basis for R”, it is particularly easy to express any 
given vector as a linear combination of the basis vectors. As for R in 

Subsection 5.1, we simply need to calculate scalar products: we do not 
need to solve a system of linear equations. 


Theorem C34 


Let {v1, V2,..., Vn} be an orthogonal basis for R” and let u be any 
vector in R”. Then 


Vie u Vou Vn’ u 
u= vit v2 +- + Tipe 
Wil © Wil Wp oD) Vn ° Vn 


Proof Let {v1,v2,...,Vn} be an orthogonal basis for R” and let u be 
any vector in R”. Since u € R”, we can write u as a linear combination of 
the basis vectors v1, V2,.--,Vn: 


u = a1 V1 + QV +--+ + AnVn- (4) 
Forming the scalar product of both sides of equation (4) with vı gives 


vı +u = a (vı vı) (all other terms are 0), 


vı’ u 


so a, = i 
vie Vi 


Similarly, forming the scalar product of both sides of equation (4) with v2 
gives 
V2°U=Q2(Vv2°v2) (all other terms are 0), 


Vv’ u 


sO AQ = . 
v2° V2 


Continuing in this way, we deduce that 


v; u 


Qi = for each 7 = 1,2,...,n. 
Vi i 
Thus 
vı’ u vo°u Vn°Uu 
u= vı + yapn Vn, 
vi’ vi V2 ° V2 Vn * Vn 
as required. ] 


The result of Theorem C34 can be expressed in the form of a strategy that 
generalises Strategy C11. 


Strategy C12 


To express a vector u in R” in terms of an orthogonal 


basis Vi, V2 Vn: 
Vau v2o°-u Vn’ u 
1. calculate ay = 6) = e An = ——— 
VIV WY) 2 NYS) Vn ° Vn 


2. write u = Q1 V1 + Q2V2 +: + AnVn- 


Exercise C76 


Express the vector (1,2,3,4) in terms of the orthogonal basis for R4 
{(1,2,1,0), (—1,1, -1, 1), (1,0, —1, 0), (1,—1,1,3)}. 


(You showed that this basis is orthogonal in Exercise C75.) 


5 Orthogonal bases 


175 


Unit C2 Vector spaces 


Erhard Schmidt 


Jørgen Pedersen Gram 


176 


5.3 Constructing orthogonal bases 
We now consider how to find an orthogonal basis. 


Suppose we want to find an orthogonal basis for R? containing the vector 
(2,1,1). This means that we need to find two more vectors orthogonal to 
each other and orthogonal to the vector (2, 1,1). 


Now recall from Unit Al that in R? a vector normal to a plane is 
perpendicular (orthogonal) to every vector in this plane. Thus to find such 
a pair of vectors, we can find two orthogonal vectors in the plane through 
the origin that has normal vector (2, 1,1). 


Using the vector equation of a plane from Unit A1, the vector equation of 
a plane through the origin with normal vector n is 


x-n=0, 
so here we have (x,y, z) + (2,1,1) = 0; that is, the equation of the plane is 
2r+y+z=0. 


Rather than pulling two orthogonal vectors vı and v2 in this plane out of 
a hat, we start with any pair of linearly independent vectors in this plane 
and follow a method known as the Gram-Schmidt orthogonalisation 
process to construct a pair of orthogonal vectors. 


In 1907, the German mathematician Erhard Schmidt (1876-1959) 
published an orthogonalisation algorithm, which became widely used. 
Schmidt acknowledged that his process was essentially the same as 
that published by the Danish mathematician Jørgen Pedersen Gram 
(1850-1916) in 1883. It appears that their names were first linked 
together in the 1930s. A related algorithm (now known as modified 
Gram-Schmidt) had been used much earlier by the French 
mathematician and scientist Pierre-Simon Laplace (1749-1827) in an 
attempt to estimate the masses of Jupiter and Saturn using the 
astronomical data of six planets. 


To find a pair of linearly independent vectors in the plane 2x + y+ z = 0, 
we need to find any two vectors in this plane that are not multiples of one 
another. We choose suitable vectors that are as simple as possible, for 
example, ones containing small numbers and zeros. We start by setting x 
to 1 and then setting z and y to 0 in turn, to get a pair of vectors. This 
gives 


wi = (1,-2,0) and we =(1,0,-2). 


Since these vectors are linearly independent, the set {w1, w2} forms a basis 
for this plane. (Any other pair of linearly independent vectors in the plane 
would do just as well.) 


We take the first vector vı in our orthogonal basis to be the first of these 
vectors, SO 


Vi = Wi = (1, —2,0). 


For the second vector v2 in our orthogonal basis, we start with w2 and 
then subtract from it a suitable multiple a of vı, chosen so that vı and v2 
are orthogonal, as illustrated in Figure 14. Since vg is a linear combination 
of vectors in the plane and the plane is a subspace, we know that və is also 
in the plane. 


So we set 
V2 = W2 — QV]; 
that is, 
vz = (1,0, —2) — a(1, —2, 0). 


We want to find the value of a so that vı and v2 are orthogonal. 
Therefore we must have 
V1 © V2 = V1 ° (W2 — avi) 


= V] * W2 — QV1 ° Vi 


= 0. 
Hence 
Vi- W2 
Q = ——; 
Vie V1 


that is, in this case 
yy = (72,0) + (1,0,-2) _ S 
(1,—2,0): (1,—2,0) 5 
Thus 
v2 = (1,0, —2) — (1, —2, 0) = ($, 2, —2) . 
So an orthogonal basis for the plane is {(1, —2,0), (3, 2, —2) }: 


Returning to the original problem, this means that we have found that an 
orthogonal basis for R? containing the vector (2, 1,1) is 


{ (2,1, 1), (1, —2, 0), (2, 2, —2) }. 


The next exercise asks you to find an orthogonal basis for R? containing a 
given vector by using the above method. 


5 Orthogonal bases 


Vg = W2 — avi 
w2 = (1,0, —2) 


Figure 14 Subtracting a bit 
of vı from wə to get an 
orthogonal vector 


177 


Unit C2 Vector spaces 


178 


Exercise C77 


(a) Find the equation of the plane through the origin with normal vector 
n = (3, —4, 5). 

(b) Show that the vectors wı = (4,3,0) and w2 = (0,5, 4) lie in this plane. 

(c) Find an orthogonal basis {v1, v2} for the plane where vı = w1, and 


Vi° Wo 
V2 = W2 — 


Vis 
Vi * V1 


(d) Hence write down an orthogonal basis for R? containing the vector 
(3, —4, 5). 


In these examples we started with a pair of arbitrary basis vectors for a 
plane and adjusted the second to obtain a pair of orthogonal basis vectors. 
This method can be extended to higher-dimensional spaces by starting 
with an arbitrary basis and adjusting the basis vectors one by one to 
obtain an orthogonal basis. It is called the Gram-Schmidt 
orthogonalisation process. 


Theorem C35 Gram-Schmidt orthogonalisation process 


Let {w1, W2,...,Wn} be a basis for R”, and let 


Then {v1, V2,..., Vn} is an orthogonal basis for R”. 


Proof ®. We show that each vector in the set {v1, V2,..., Vn} is 


orthogonal to every other vector in the set. ® 


We note first that v2 is orthogonal to v1, since 


Vi°Wwo2 

Vi“ V2 =V1* | W2 > | = 1 
Vie V1 

= (01 v2) - (2% 


= (vi . w2) = (vi - W2) =0. 


5 Orthogonal bases 
Next we note that v3 is orthogonal to both vı and v2, since 
Vi°V3 = V1 ° | W3 — V= V2 
vi’ Vi V2 V2 


= (v1 wa) = (EE) (vr vn) = (E) ve) 


V2 ° V2 


because vı and v2 are orthogonal. 


Similarly, 


_ V1’ W3 V2 ° W3 
V3 * V3 = V9“ W3 — TA y= a V2 
1° VI 2° ¥2 


= (v2 wa) = (SEE) (va eva) = (RE) (ve ve) 


Vier vi v2°V2 


Continuing in this way, we deduce that each of the vectors v; is orthogonal 
to all the previous ones. It follows that v; vj = 0 for all 7, 7 with i Æ j, 
and hence that {vi,Vv2,...,Vn} is an orthogonal basis for R”. E 


Exercise C78 


Apply the Gram-Schmidt orthogonalisation process to the following basis 
for R5: 


{(1,0,0,0,0), (0,2,0,0, 0), (0,0, 1,1,0), (1,1,1, 1, 1), (1,0, —1, 0, 1)}. 


(You showed, in Exercise C74, that {(1,0,0,0,0), (0,2,0,0,0), (0,0,1,1,0)} 
is an orthogonal set in R5.) 


5.4 Orthonormal bases 


Xy 


You have seen that using orthogonal basis vectors can be helpful. However, 
in many examples it is also useful to require one further condition — that 


the basis vectors are all unit vectors, as in the standard basis for R”. lv] = 18 


Recall, from Unit A1, that the magnitude of a vector v in R? or R° is 


lv] =Vv-v. v = (5, —12) 


For example, if v = (5, —12), then |v| = \/5? + (—12)? = V169 = 13, as Figure 15 The magnitude of 
illustrated in Figure 15. the vector (5, —12) 


We can similarly define the magnitude of a vector in R”, for any positive 
integer n. 


179 


Unit C2 Vector spaces 


Definition 
Let v = (v1, V2,..-,Un) be a vector in R”. Then the magnitude of v 
is 


lV) = Vv -v=y/uz tug tee + U2. 


Exercise C79 


Calculate the magnitude of each of the following vectors. 
(a) (3,—4,5) in R3. (b) (1,2,—-1,0,3) in RŠ. 


Exercise C80 


Prove that if v is any non-zero vector in R”, then the vector 


lvi lv 


has magnitude 1. 


We make the following important definition. 


Definition 
An orthonormal basis for R” is an orthogonal basis in which each 
basis vector has magnitude 1. 


An orthonormal basis is therefore comprised of orthogonal unit vectors. 


It follows from the result of Exercise C80 that, given an orthogonal basis 
for R”, we can obtain an orthonormal basis by scalar multiplication: we 
need to multiply each basis vector by the reciprocal of its magnitude. This 
leads to the following strategy for constructing an orthonormal basis. 


Strategy C13 


To construct an orthonormal basis for R” from an orthogonal basis 
Mig eee Wal os 


1. calculate the magnitude of each basis vector 


2. scalar multiply each basis vector by the reciprocal of its magnitude. 


v v 
The required orthonormal basis is o meee ee ae 
[va] [v2] [Val 


180 


As a shorthand for ‘scalar multiply a vector by the reciprocal of its 
magnitude’, we may say ‘divide a vector by its magnitude’. 


For example, we can use Strategy C13 to obtain an orthonormal basis 
for R? starting with the orthogonal basis {(2, 1,1), (1, —4, 2), (—2,1,3)}, as 
follows. We calculate the magnitude of each basis vector: 

|(2,1,1)| = V22 +12 + 12 = V6, 

|(1, —4, 2)| = 1? + (—4)? +2? = V 21, 

I(-2,1,3)| = VŒ 22 + 2 +3? = VTA. 
Dividing each orthogonal basis vector by its magnitude, we obtain the 
orthonormal basis 


Exercise C81 


Construct an orthonormal basis for Rt, starting with the basis 
{(1,2,1,0), (—1,1,—1, 1), (1,0, —1,0), (1, —1, 1, 3)}. 


(You showed, in Exercise C75, that this is an orthogonal basis for R4.) 


Note that some of our earlier results become much simpler if we use an 
orthonormal basis, rather than an orthogonal one. For example, 
Theorem C34 takes the following form because v; + v; = 1 for each i < n. 


Theorem C36 


Let {vi,V2,...,Vn} be an orthonormal basis for R”, and let u be any 
vector in R”. Then 


u = (v1 - u)vi + (v2: u)vo+---+ (Vn ° U)Vn.- 


5.5 Other vector spaces 


We conclude this section by remarking that it is possible to define scalar 
products in vector spaces other than R”. For example, in the vector 
space P we can define the scalar product of two polynomials pı and pə by 


1 
Pi: P2 = f nowe) dz. 


Such a scalar product is a real number and has properties that are very 
similar to those of the scalar product in R” — for example, pı + p2 = p2 * pı 
for any polynomials pı and pə. 


5 Orthogonal bases 


181 


Unit C2 Vector spaces 


182 


We can then define such concepts as orthogonal polynomials, the magnitude 
of a polynomial, and the distance and angle between two polynomials. For 
example, the polynomials pı(x) = x and po(x) = z? are orthogonal, since 


1 
pi-pe= f x- x° dr = [iz] =0 
-1 


and the magnitude of pə is given by 
1 
1 
|p2|? = P2 * p2 = E . r? dx = ltz] = A 


so |p2| = 2. 


Although such concepts may seem at first sight to make little sense 
intuitively, they have proved to be of great interest and importance, for 
example in mathematical physics. They also show that the mathematical 
structures we have introduced theoretically here can have surprising 
applications in other contexts. 


Summary 


In this unit you have seen how familiar properties of R? and R? can be 
generalised to other, very different sets of vectors through the concept of a 
vector space. 


Your study of vector spaces has been driven by looking at properties of R? 
and R, such as linear combinations, linear independence and spanning 
sets of vectors. You have seen how the familiar concept of axes and our 
intuitive idea of dimension relate to bases of these spaces. You have seen 
how these concepts generalise to R” and other, very different vector spaces 
such as Pa, Mm n and C. You have met the Basis Theorem, which states 
that every basis for a given vector space has the same number of vectors, 
and that this number is the dimension of the vector space. 


Starting with subspaces of R? and R? that can be visualised geometrically, 
you have seen that subspaces of vector spaces are subsets that are 
themselves vector spaces, in the same way that subgroups are subsets of 
groups that are themselves groups. 


Finally, you have seen how the scalar product and orthogonality of vectors 
in R” can be used to find orthogonal and orthonormal bases, which are 
particularly straightforward to work with. 


Vector spaces will underpin the remainder of the linear algebra units; in 
particular you will study functions between vector spaces in Unit C3 
Linear transformations and use orthonormal bases to classify conics and 
quadrics in Unit C4 Figenvectors. 


Learning outcomes 


Learning outcomes 


After working through this unit, you should be able to: 
e understand the definition of a real vector space 


e check whether or not a given set of elements forms a vector space under 
the operations of vector addition and scalar multiplication 


e explain the meaning of the terms linear combination, span and spanning 
set 


e form linear combinations of vectors in a given set 


e check whether a vector can be expressed as a linear combination of given 
vectors 


e find the set spanned by a given set of vectors 


e check whether a given set of vectors spans the vector space to which the 
vectors belong 


e explain the meaning of the terms linear independence, linear dependence, 
basis and dimension 


e test whether a given set of vectors is linearly independent 
e test whether a given set of vectors is a basis for a given vector space 


e find the E-coordinate representation of a vector given in standard 
coordinates, and vice versa 


e explain what is meant by a subspace of a vector space 

e test whether a given subset of a vector space is a subspace 
e find a basis for a subspace, and hence find its dimension 

e check whether the vectors in a given set are orthogonal 

e express a given vector in terms of an orthogonal basis 


e use the Gram-Schmidt orthogonalisation process to find orthogonal 
bases in R” 


e given an orthogonal basis, construct an orthonormal basis. 


183 


Unit C2 Vector spaces 


Solutions to exercises 


Solution to Exercise C44 Solution to Exercise C46 
u+ v = (1,—1,2,0,—3) + (0,2, —1, 4,0) (a) (p(x) + po(x)) + p(z) 
= (1,1,1,4, —3) = ((a, + biz + cz?) + (ag + box + c22°)) 
3u = —3(1, —1, 2,0, —3) + (a3 + b3x + ¢327) 
= (—3, 3, —6, 0, 9) = ((a, + az) + (bı + b2)z + (c1 + cp) x”) 
+ (az + baz + c327) 


Solution to Exercise C45 eee ee E 


Let u = (u1, u2, U3, U4), V = (U1, V2, V3, V4) and retatik 
w = (w1, W2, W3, W4). 
(a) (u+v)+w 
= ((u1, U2, U3, U4) + (V1, V2, U3, U4) 
+ (w1, We, W3, w4) 


= (u1 + V1, U2 + V2, U3 + V3, U4 + V4) 


and 
pi(x) + (p2(x) + p3(z)) 
= (a, + bye + C27) 
+ ((ag + box + cox”) + (a3 + b3x + c327)) 
= (a, + biz + cz’) 
+ ((a2 + a3) + (b2 + b3)@ + (c2 + €3)a°*) 
= (a1 + ag + a3) + (bı + b2 + b3)x 
+ (c1 + c2 + c3)x?. 
Therefore 
(pı(x) + po(x)) + ps(x) 
E ((v1, V2, V3, V4) + (w1, We, w3, w4)) = p(x) + (po(x) + p3(£)), 
= (u1, u2, U3, Ua) and so the associative property (A2) holds for 
+ (v1 + w1, v2 + We, V3 + W3, V4 + w4) addition in P3. 
= (u1 + v1 + Wi, U2 + v2 + Wa, Us + U3 + Ws, (b) We have 0 = 0 + 02 + 0x2, so 
ua + v4 + w4). 
Therefore (u + v) + w = u + (v + w), and so the 
associative property (A2) holds. 
(b) v+ (=v) 
= (v1, V2, V3, V4) + (—V1, —V2, —V3, —V4) 


= (v1 — U1, V2 — V2, V3 — V3, V4 — V4) 


+ (w1, We, w3, w4) 
= (uy + v1 + w1, U2 + V2 + We, ug + V3 + W3, 
ua + v4 + wa), 
u+(v+w) 


= (u1, u2, U3, Ud) 


p(x) +0 = (a, + bye + az?) + (0+ Ox + 0z?) 
= (ay +0) + (b1 + 0)x + (cy + 0)2? 
= a + bız + cz? = pı (z) 


Also, using the commutative property (A5) 
(proved in Worked Exercise C23(a)) we have 


= (0,0,0,0) =0 pi(z) +0 = pi(x) = 0 + pi(z), 
Also, using the commutative property (A5) so the additive identity property (A3) holds for 
(proved in Worked Exercise C22(a)) we have addition in P3. 


v+(-v) =0=-v4v, . . 
Solution to Exercise C47 
so the additive inverses property (A4) holds. 
(a) 1x p(x) =1 x (1 — z + 227) 
=1x1—1xg+1x 22? 
=1-2£+4 227 = p(x), 


and therefore the identity property (S3) holds here. 


184 


(b) a(Sp(x)) = 2(—3(1 — x + 227) 

= 2(-3 + 3x — 627) 
—6 + 6x — 122? 

= —6(1 — x + 2x) = (a8)p(a), 
and therefore the associative property (S2) holds 
here. 


Solution to Exercise C48 


(a) Consider (1,3) and (2,5), both in V. Then 
(1,3) + (2,5) = (3,8), which does not belong to the 
set V, since 2x 3+1=7#8. So the set is not 
closed under vector addition. 


Therefore the set of all ordered pairs (x,y) with 
y = 2x + 1 fails to satisfy the closure axiom (A1), 
so is not a real vector space. 


Alternatively, note that for (0,0) € R? we have 
2x0+1=1#0, so the zero vector is not in V 
and the additive identity axiom (A3) fails. 


Other axioms also fail or do not make sense. 


(b) Consider the matrix A = (o 5) and 


a= $. Then aA = 7 3): which does not 


1 
2 2? 
belong to the set. 


Therefore the set of matrices of the form 


¢ c) with a,b,c E€ Z 
b c 


fails to satisfy the closure axiom (S1), so is not a 
real vector space. 


Note that axioms A1-A5 and S3 do all hold here, 
but since axiom S1 fails, the axioms $2, D1 and D2 
are meaningless. 


Solution to Exercise C49 


(a) 4vı — 2v2 = 4(0,3) — 2(2, 1) 
= (0,12) — (4,2) = (—4, 10) 
(b) 3v1 + 2v2 = 3(1,2,1,3) + 2(2, 1,0, —1) 
= (3,6,3,9) + (4, 2,0, —2) 


z (7, 8, 3, 7) 


Solution to Exercise C50 


(a) 2v1 — 4v2 = 2(2 — x + 3x?) — 4(—1 + z) 
= (4 — 2z + 6x?) — (—4 + 42) 
=8 — 6x + 62? 


Solutions to exercises 


(b) 2v1 — 4və = 2sin z — 4z cos x 


(c) 2v, — 4v2 = 2 E o) —4 G E 
-( i o) Co =s) 
-(4 2) 


Solution to Exercise C51 
We apply Strategy C6. 
(a) Let a and 8 be real numbers such that 


(2,4) = a(0,3) + 8(2, 1) = (28,30 + 8). 
Equating corresponding coordinates, we obtain the 
system 

28 =2 
38a+ B=4. 


The first equation gives 6 = 1, and substituting 
this into the second equation gives a = 1, so 


(2,4) = (0,3) + (2,1). 


(You might have spotted this linear combination 
without performing the calculations — it is always 
worth checking there is not an obvious solution 
before diving into a strategy!) 


(b) Let a, 8 and y be real numbers such that 
(2,3, -2) = a(0, 1,0) + 81,2, -1) + 7(1, 1, —2) 


Equating corresponding coordinates, we obtain the 
system 


+ y=2 
a+ 28+ y=3 
=f = y= —2. 


Adding the first and third equations gives y = 0, 
and substituting this into the first equation gives 
6 = 2. Substituting both these values into the 
second equation gives œ = —1, so 


(2,3, —2) = —(0,1,0) + 2(1,2, —1) + 0(1, 1, —2). 


185 


Unit C2 Vector spaces 
(c) Let a and 8 be real numbers such that 
3 l 1 —1 0 —2 
Casel ato 
fa 
— 0 
Equating corresponding entries, we obtain the 
system 


a =3 
—a — 2ß =1 
2a+ B=4. 
The first equation gives a = 3, and substituting 
this into the second equation gives 3 = —2. These 


values also satisfy the third equation, so 


(0 4) =3(0 “2)-2(0 i) 


Solution to Exercise C52 
(a) We write 
(1,5, 4) = avı + Bve 
= a(1,0,3) + B(0, 2,0) = (a, 26, 3a). 
Equating corresponding coordinates, we obtain the 
system 


a =] 
20 =5 
3a = 4, 


This system is inconsistent and therefore has no 
solution. So (1,5,4) does not lie in the subset of 
R? spanned by {v1, v2}; that is, (1,5,4) does not 
belong to ({v1, v2}). 
(b) We write 
(1,5, 4) = avı + Bv2 + v3 

= a(1,0,3) + G(0, 2,0) + y(0,3, 1) 

= (a, 26+ 37,3a+7). 
Equating corresponding coordinates, we obtain the 
system 


a =] 
28+ 3y =5 
3a + y=4. 


The first equation gives œ = 1, and substituting 
this into the third gives y = 1. Substituting this 
into the second equation gives 8 = 1, so (1,5,4) 
lies in the subset of R? spanned by {v1, v2, v3}; 


186 


that is, (1,5,4) belongs to ({v1, v2, v3}) and it can 
be written as 


(1,5,4) = 1(1,0,3) + 1(0, 2,0) + 1 (0,3, 1). 


(You might have spotted this and avoided 
following the formal method.) 


Solution to Exercise C53 
(a) Each vector in R? can be written as (x,y). To 
show that (x,y) is in ({(1,1), (—1,2)}), we write 

(x, y) = a(l, 1) F pi, 2) 

= (a — B,a+ 28). 

Equating corresponding coordinates, we obtain the 
system 

a- =r 

a+28ß =y. 
These equations have solution a = $(2a + y) and 


B= $(y — x), so any vector in R? can be written in 


terms of (1,1) and (—1,2) as 
3(2@ + y)(1,1) + ¿(y — x)(—1, 2). 


),(—1,2)} is a spanning set for R?. 


(x,y) = 
So {(1,1 


(b) Each vector in R? can be written as (x,y). To 
show that (x,y) is in ({(2, —1), (3,2)}), we write 


(x,y) = a(2, —1) + A(3, 2) 
= (2a + 38, —a + 28). 
Equating corresponding coordinates, we obtain the 
system 


2a4+ 36 =2 
—a + 2p =y. 


These equations have solution a = 4 (2a — 3y) and 
B= 4 (a + 2y), so any vector in RÊ can be written 
in terms of (2,—1) and (3,2) as 


(x,y) = 7 (2x — 3y)(2,—-1) + 3 (a + 2y)(3, 2). 
So {(2,—1), (3, 2)} is a spanning set for R?. 


Solution to Exercise C54 
We write 
(x,y,z) = a(1,0,0) + 8(1, 1,0) + 7(2, 0,1) 
= (a+ B+ 27, 8,7). 


Equating corresponding coordinates, we obtain the 
system 


a+B+2y=2 
p =y 
y=. 


Working backwards from the third equation, we 
find that these equations have solution y = z, 
B = y and a = z — y — 2z, so any vector in R? can 
be written in terms of (1,0,0), (1,1,0) and (2,0,1) 
as 

(x,y,z) = (x — y — 2z)(1, 0, 0) 

+ y(1,1,0) + z(2,0, 1). 

So {(1,0,0), (1,1,0), (2,0,1)} is a spanning set 
for R3. 


Solution to Exercise C55 


Each polynomial in P4 can be written as 
a+bz+ cz? + dx?. To show that a+ ba + cx? + dx? 
belongs to ({1+2,1+27,1+23,2}), we write 
a + ba + cx? + dx? 
=a(1+2)+6(14+ 27) +7(14+ 23) + ôr 
=(a+ 6+ 7) + (at 6)x + Ba? + ya”. 
Equating corresponding coefficients, we obtain the 
system 


a+B+y¥ = 
a +d=b 
B = 

y =d. 


It has solution y = d, 8 a—c—dand 
d6=b—a+cec+d. So 
a+ ba + cr? + dr’ 
= (a—c-—d)(1+2)4 
+(b—a+c+d)z. 
Thus ({1+2,1+2°7,1+2°,2}) = Fy. 


cC, a 


Solution to Exercise C56 
(a) We have 
(S) = {a(1,0,0): ae R} 
= {(a,0,0): a € R}. 
(Geometrically, (S) is the z-axis.) 


Solutions to exercises 


(b) We have 
(s)={a(5 3) +8(-5 9) seer} 


_ || f2ea—2 0 ; 
-ABe ptas) aser} 
Thus 


ssil a aber}. 


To show that every 2 x 2 diagonal matrix belongs 
to (S), we write 


a 0\_ (2a—8 0 
0 b) | 0 3a +28) ` 
Equating corresponding entries, we obtain the 
system 
2a-— B=a 
3a + 28 =b. 


It has solution 


SO 


Solution to Exercise C57 


(a) These two vectors are linearly independent 
because neither is a multiple of the other. (In this 
case there is no need to use Strategy C7.) 


(b) Using Strategy C7, we write 
a(1,—1) + 8(1,1) + ~(2,1) = (0,0). 
This gives the system 
a+p+2y= 0 
—a+ß+ y=0. 
Adding the equations gives 28 + 3y = 0, or 


p= —3y, and substituting this into the first 


equation gives a = —$7; that is, y = —2a and 
8 = 3a. The solution set of the system is 


a=k, B= 3k, y= —2k, keR, 


187 


Unit C2 Vector spaces 


so there are infinitely many solutions. For 
example, k = 1 gives 


(1,-1) + 3(1, 1) — 2(2,1) = (0,0). 
So the set {(1, —1), (1,1), (2, 1)} is linearly 
dependent. 


Alternatively, you may have expressed the solution 
set here in terms of y and found another solution — 
any solution (where a, 3 and y are not all zero) is 
sufficient to show that the vectors are linearly 
dependent. 


(c) These two vectors are linearly independent 
because neither is a multiple of the other. (In this 
case there is no need to use Strategy C7.) 


(d) We write 
a(1,0,0) + B(1, 1,0) + (1, 1,1) = (0,0,0). 


This gives the system 


a+8+7=0 
B+ry=0 
y = 0. 


The third equation gives y = 0, and substituting 
into the second equation gives 6 = 0. Finally, 
substituting into the first equation gives a = 0. 
The only solution is œ = 8 = y = 0. 

Therefore the set {(1,0,0), (1,1,0), (1,1, 1)} is 
linearly independent. 

(e) These two vectors are linearly independent 
because neither is a multiple of the other. (Again, 
there is no need to use Strategy C7.) 


Solution to Exercise C58 


(a) The set {1,2,27,2°,1+a¢+27+ 2°} is 
linearly dependent because the fifth vector is the 
sum of the first four vectors. So 


1+r+r? +r’ (1+ +r? +r’) =o. 
(b) The set S is linearly independent because 
neither matrix is a multiple of the other. 
(c) We apply Strategy C7. 
We write 


elo a) +e S)#7G a) = o) 


188 


which can be written as 


o a+y a o) 
B+y a+8+y) \0O 0j’ 


Equating corresponding entries, we obtain the 
system 


a+B+y7y=0 
a +7=0 

py =0 
ap ey =0. 


Subtracting the second equation from the first, and 
the third from the fourth, we get 6 = 0 and a= 0. 
Substituting these values in the first and fourth 
gives y = 0 also. Therefore the only solution to 
this system is a = 8 = y = 0. Therefore the set S$ 
is a linearly independent subset of M22. 

(d) The set {1 +i, 1 — i} is linearly independent 
because neither vector is a (real) multiple of the 
other. 


Solution to Exercise C59 


(a) None of the vectors in this set has a non-zero 
x-component; so whenever x # 0, we cannot write 
(x,y,z) in terms of these three vectors. 


Therefore this set of vectors is not a basis for R® 
because it does not span R?. 


(If you had not spotted the zero x-component and 
had followed Strategy C8, you would have 
discovered that this set is not linearly independent: 
for example, 


16(0, 1,2) — 11(0, 2,3) + (0,6,1) = (0,0, 0). 
Therefore this set of vectors is not a basis for R3.) 
(b) We check both conditions in Strategy C8. 
Using Strategy C7, we write 

a(1,2,1) + 6(1,0,—1) + y(0, 3,1) = (0,0,0), 
which simplifies to 


(a+ B, 2a + 3y,a 


pry) = (0, 0, 0). 


Equating corresponding coordinates, we obtain the 
system 


a+ =0 
2a + 37 =0 
a—Bt+ y=0. 


Adding the third equation to the first gives 

2a + y = 0, and subtracting this from the second 
equation gives y = 0. Substituting this into the 
second equation gives a = 0. Finally, substituting 
a = 0 into the first equation gives 8 = 0. The only 
solution is a = 8 = y =0. 


Therefore the set is linearly independent. 
We apply Strategy C6. 


Each vector in R? can be written as (a, y, z), with 
x,y,z E R. To show that (x,y,z) is in 


aG, 2, 1), (1, 0, =1); (0, 3, 1)}), 
we write 
(x,y,z) = a(1, 2,1) + 8(1,0,—1) + (0,3, 1). 


Equating corresponding coordinates, we obtain the 
system 


a+ =o 
2a + 3y=y 
a-B+ y=z. 


Adding the third equation to the first gives 

2a +y = x + z, and subtracting this from the 
second equation gives y = $(y— <£ — 2). 
Substituting this into the second equation gives 
a= 7 (3a —y+3z). Finally, substituting for a in 
the first equation gives 8 = F(a +y—3z). We have 
a solution, so any vector in R? can be written as 


(x,y,z) = 3(37 — y + 3z)(1,2,1) 
+ q(x +y — 3z)(1,0, —1) 
+3% -a= 2)(0,; 1). 
Therefore the set of vectors spans R3. 
Thus {(1, 2,1), (1,0, —1), (0,3, 1)} is a basis for R. 
(c) Here we have 
(1,1,1) = (1,0,0) + (0, 1,0) + (0,0, 1), 
so these vectors are not linearly independent. 


Therefore the set 
{(1, 0, 0), (0, l; 0), (0, 0, 1), (L I; 1)} 


is not a basis for R3. 


Solution to Exercise C60 
We check both conditions in Strategy C8. 


Solutions to exercises 


This set is linearly independent because there are 
only two vectors in the set, and neither vector is a 
multiple of the other. 


We apply Strategy C6. 


Each vector in R* can be written as (a, y, z, w), 
with z,y,z,w € R. To show that (x,y, z,w) is in 


({(1, 2, =i, =I); (=1; 5, 1, 3)}), 
we write 
(x, Y, Z, w) = a(1, 2, =k; —1) Ts B=, 5, i 3). 


Equating corresponding coordinates, we obtain the 
system 


a- p=r 


2a +58 =y 
—a + b=z 
—a + 38 = w. 


Adding the first and third equations gives 
xz +z = 0. This contradicts the assumption that x, 
y, z and w can take any real values, so 


{(1,2,—1,—1), (—1,5,1,3)} 
is not a spanning set for R4. 


Thus the set {(1,2, —1, —1), (—1,5,1,3)} is nota 
basis for R£. 


Solution to Exercise C61 
We check both conditions in Strategy C8. 
Using Strategy C7 we write 


ali o) +90 a) #76 1) +40 a) 
= (o 0) 


which simplifies to 


a+2y—-3ô =P’) /0 0 
a+ B y -o Op" 
Equating corresponding entries, we obtain the 
system 


189 


Unit C2 Vector spaces 


From the fourth equation we have y = 0, and from 
the second and third œa = —6 = —6. Substituting 
into the first equation gives a+ 3a = 0, so a= 0. 
The only solution is therefore œa = 6 = y = ô = 0. 


Therefore the set is linearly independent. 

We apply Strategy C6. 

Each 2 x 2 matrix can be written as (: a) with 
a,b,c,d € R. To show this is in (S) we write 


a b 1 0 0 —1 
(a=k o) tG o) 
2 0 —3 1 
+o 1) +8( 0 a 
Equating corresponding entries, we obtain the 
system 


Qa + 27 — 36 =a 
—B + d6=b 
a+ £6 =C 
y =d; 


From the fourth equation we have y = d, and 
adding the second equation to the third gives 

a +ô = b+c. Substituting for y in the first 
equation gives œ — 36 = a — 2d. These last two 
equations give 6 = 1(b +c—a+2d). 

Then, by substitution, a = (a + 3b + 3c — 2d) and 
B= 4(—a — 3b + c + 2d). 

We have a solution a = (a + 3b + 3c — 2d), 

B = 4(-a— 3b + c + 2d), y = d and 

= į(b+c— a+ 2d). 

Therefore the set of matrices S spans the set M2,2 
of all 2 x 2 matrices. 


Thus S is a basis for M20. 


Solution to Exercise C62 


(a) For the basis E = {(1,2), (—3,1)}, we have 
ONp=20 41-3.) 
= (2,4) + (—3, 1) 
= (-1,5). 
(b) For the basis 
E = {(1,0, 2), (—1, 1,3), (2, -2,0)}, we have 


190 


(1,1,—1)g = 101, 0,2) + 1(—1, 1,3) = 1(2, —2,0) 
= (1,0,2) + (—1, 1,3) — (2, —2,0) 
= (—2, 3,5). 


Solution to Exercise C63 
(a) We write 
(5, —4) = a(1, 2) + B\—8, 1). 
Equating corresponding coordinates, we obtain the 
system 
a—36=5 
2a+ B=-A4. 
Solving these equations gives a = —1, 8 = —2, so 
(5, —4) = —=1(1, 2) = 2(—3, 1) 
= (=1, —2)p. 
(b) We write 
(—3,5, 7) = a(1,0, 2) + 8(—1, 1,3) + (2, —2, 0). 
Equating corresponding coordinates, we obtain the 
system 
a- B+2y7=-3 
B-22ay=5 
2a + 36 =f, 


Adding the first and second equations gives a = 2, 
and substituting this into the third equation gives 
6 = 1. Substituting for @ in the second equation 
gives y = —2. So 


(—3,5,7) = 2(1,0,2) + 1(—1, 1,3) — 2(2, —2, 0) 
= (2,1, —2)g. 


Solution to Exercise C64 
We apply Strategy C9. 


(a) This set contains only two vectors, not three, 
so cannot be a basis for R. 


(Neither vector is a multiple of the other, so it is 
however linearly independent.) 


(b) This set contains three vectors, so it may be a 
basis for R. 


We write 


a(1, 0, 1) za B(1,9, —1) + 7(0, 1, 1) = (0,0, 0). 


Equating corresponding coordinates, we obtain the 
system 


a+ =0 
y=0 
&a—p+y=0. 


The second equation gives y = 0. Substituting this 
into the third equation gives œ — 6 = 0. Adding 
this new equation to the first equation gives a = 0 
and hence 6 = 0. The only solution is 
a=B=7=0. 

Therefore the set is linearly independent. 


The set contains three vectors and is linearly 
independent; therefore it is a basis for R®. 


(c) Here we have 

(1, -1,0) + (2,1, 4) = (8,0, 4), 
so this set is not linearly independent. 
Therefore this set is not a basis for R. 


(It does however contain the correct number of 
vectors. ) 


(d) This set contains four vectors, so it cannot be 
a basis for R3. 


(Alternatively, here we have 
(1,1, 1) = (1,0,0) + (0, 1,0) + (0,0, 1), 


so this set is also linearly dependent.) 


Solution to Exercise C65 
We apply Strategy C9. 


(a) This set contains four vectors and M2. has 
dimension 4, so it may be a basis. 


Using Strategy C7 we write 
1 0 0 1 1 1 0 1 
a(i o) +900 +o o) #80 1) 
M0 
~ \0 0 


which simplifies to 


a+y Paes}. 70 0 
até B+ô ~ \O 0)’ 


Solutions to exercises 


Equating corresponding entries, we obtain the 
system 


a +y =0 
a ae ae 
a +ô=0 

B +ô=0. 


From the first, third and fourth equations we have 
a = B = —y = —ô. Substituting in the second 
gives — 8 = 0. The only solution is therefore 
Therefore the set is linearly independent. 

The set S contains four vectors and is linearly 
independent so is a basis for Mp 9. 

(Compare the length of this solution to that of 
Exercise C61 using Strategy C8.) 

(b) This set contains two vectors and P has 


dimension 2, so it may be a basis. 


This set is linearly independent because there are 
only two vectors in the set, and neither vector is a 
multiple of the other. 


So by Strategy C9, the set is a basis for P2. 


Solution to Exercise C66 


The set S is a subset of R?, so we use Strategy C10. 
If x = 0, then (x, —2x7) = (0,0), so S contains the 
zero vector of R?. 
Let vı = (x1, —221) and v2 = (x2, —2x2) belong 
to S. Then 
Vi + V2 = (#1, —221) + (x2, —222) 
= (41 + T2, —22%1 — 272) 
= (z1 + Zo, —2(£1 + T2)). 
This vector has the correct form for a vector in S, 
since 71 + £2 € R, so S is closed under vector 
addition. 
Let v = (x, —2x) € S anda € R. Then 
av = a(x, —2x) 
= (az,a(—2x)) 
= (az, —2(az)). 
This vector has the correct form for a vector in S, 


since ax € R, so S' is closed under scalar 
multiplication. 


191 


Unit C2 Vector spaces 


Since conditions (1), (2) and (3) are satisfied, S is 
a subspace of R?. 


(This subspace is the line through the origin with 
equation y = —2z.) 


Solution to Exercise C67 


In each case the set S is a subset of V, so we use 
Strategy C10. 
(a) If 0 € S, then (x, + 2) = (0,0) for some 
number x. Equating coordinates, we obtain the 
system 

x =0 

T= =l, 
This system is inconsistent so has no solution. 


Therefore 0 does not belong to S and condition (1) 
is not satisfied. Hence S is not a subspace of R?. 


(b) If x = y = z = 0, then 
(x,y, 2,2 + 2y — z) = (0,0,0,0), 
so S contains the zero vector of R4. 
Let vı = (z1, 91, 21,21 + 2y1 — 21) and 
V2 = (£2, yo, Z2, T2 + 2Y2 — z2) belong to S. Then 
vı + ve = (x1, Y1, 21,21 + 2y1 — 21) 
+ (£2, Y2, 22,22 + 2y2 — 22) 
= (£1 + £2, Y1 + Ya, Z1 + 22, 
£1 + 2y1 — 21 + Lo + 2y2 — 22) 
= (£1 + £2, Y1 + Yo, 21 + 22, 
(x1 + x2) + 2(y1 + yo) — (z1 + 22)). 
This vector has the correct form for a vector in S, 


since £1 + £2, Y1 + yo, 21 + 22 € R, so S is closed 
under vector addition. 


Let v = (a, y,2z,0 + 2y — z) E€ S and a € R. Then 
av =qQa(z,y,z,£ + 2y — z) 
= (ax, ay, az,a(x + 2y — z)) 
= (az, ay, az, (ax) + 2(ay) — (az)). 
This vector has the correct form for a vector in S, 


since ax, ay,az E€ R, so S is closed under scalar 
multiplication. 


Since conditions (1), (2) and (3) are satisfied, S is 
a subspace of R4. 


192 


Solution to Exercise C68 


In each case the set S' is a subset of V, so we use 
Strategy C10. 


(a) The zero vector of P3 is 0 + Ox + 0x? = 0. If 
a =b = 0, then p(x) = 0+ 0x = 0, so S contains 
the zero vector. 


Let pi (x) = a, + bız and po(x) = a2 + box belong 
to S. Then 
pi(x) + po(x) = a + bia + ag + box 
= (a1 + ag) + (bı + ba) a. 
This polynomial has the correct form for a vector 
in S, since a, + a2, b; + b2 € R, so S is closed under 
vector addition. 


Let p(x) =a+br € S and a € R. Then 
ap(z) = aa + abz = (aa) + (ab)z. 


This polynomial has the correct form for a vector 
in S, since aa,ab € R, so S is closed under scalar 
multiplication. 


Since conditions (1), (2) and (3) are satisfied, S is 
a subspace of V. 


(b) The zero vector of Ps is 0 + 0x + Ox? = 0, 
which is not of the form x + ax? for a vector in S. 
Therefore 0 does not belong to S and condition (1) 
fails. Hence S is not a subspace of P3. 


(Alternatively, you may have spotted that 
conditions (2) and (3) also fail. Using a 
particularly simple vector can make the 
calculations to show this easy: by setting for 
example a = 0, we see that p(x) = x belongs to S. 
The sum p(x) + p(x) = 22, however, does not 
belong to S, and for a € R not equal to 1, the 
scalar product az is also not in S.) 


(c) The zero vector of M29 is 0 = € a which 


is not of the form for a vector in S. 


a 1 
0 d 
Therefore 0 does not belong to S and condition (1) 
fails. Hence S is not a subspace of M29. 


Solution to Exercise C69 


Since {(1, —2,0), (0,3,3)} is a linearly independent 
set, the subspace it spans is a two-dimensional 
subspace of R? and is therefore a plane through the 
origin with equation 


ax + by + cz = 0, 
where a, b, c are not all zero. 


Since the vectors in the spanning set lie in the 
plane, the values of a, b and c must satisfy the 
system 

a — 2b =0 

3b + 3c = 0. 

The first of these equations gives a = 2b, and the 
second equation gives c = —b, so the subspace is 
the plane with equation 2ba + by — bz = 0, or, 
equivalently, 


2e+y—-—z=0. 


Solution to Exercise C70 
Since 
(x,y, 2, + 2y — z) 
= (x,0,0,x) + (0, y, 0, 2y) + (0,0, z, =z) 
= #(1,0,0,1) + y(0, 1,0, 2) + 2(0,0,1, —1), 
any vector in S can be written as a linear 
combination of the vectors in the set 


{(1, 0, 0, 1); (0, k 0, 2), (0, 0, 1, =1)}; 
so this set spans S. 


To check whether these vectors are linearly 
independent, we write 
a(1,0,0,1) + 8(0,1,0,2) + 7(0,0,1,—1) 
= (0,0,0,0). 
This gives the system 


a =0 
B =0 
y=0 
a+28—-7y=0, 


and hence a = 6 = y = 0. Therefore the set is 
linearly independent. 

So {(1,0,0,1), (0,1, 0,2), (0,0,1,—1)} is a basis 
for S. Therefore S has dimension 3. 


Solutions to exercises 


Solution to Exercise C71 
(a) (2,1,1)-(1,-4,2)=2x1+1x(-4)41x2 
=2—442=0, 

so (2,1,1) and (1, —4,2) are orthogonal. 
(b) vi-vg =-2x9+6x2+1x6=0, 
so vı and vg are orthogonal. 

V1 +V3 = —2x 446 x (—15) +1 x (-1) 

= —99, 

which is non-zero, so vı and v3 are not orthogonal. 

v2 * V3 =9 x 4+2 x (—15) 

+6 x (-1) =0, 

so v2 and v3 are orthogonal. 


Solution to Exercise C72 

(a) Let vı = (3,4,0), vo = (8, —6, 0) and 

v3 = (0,0,5). Then 
vı V2 =3x8+4x(—6)+0x0=0, 
viev3=3x04+4x04+0x5=0, 
vo-v3 =8x0+(-6)x0+0x5=0. 

Thus {v1, v2, v3} is an orthogonal set in R3. Since 


there are three non-zero vectors in this set, it is an 
orthogonal basis for R. 

(b) We apply Strategy C11. 

Vy’ u 

ai = —— 

vr Vi 

(3,4,0) - (10, 0, 4) 
(3, 4, 0) - (3, 4, 0) 
30 6 


(8, =0; 0) ° (10, 0, 4) 
(8, =; 0) Š (8, =6; 0) 
80 4 


100 5 
and 
v3°U 


V3 ° V3 
(0,0,5) - (10,0, 4) 
~ 0,0,5) - (0,0,5) 
2 4 
=== 
Thus (10, 0,4) = $(3,4,0) + 2(8, —6,0) + (0,0,5). 


193 


Unit C2 Vector spaces 


Solution to Exercise C73 Therefore these vectors form an orthogonal set 
in R*. Since there are four, non-zero vectors in this 
(a) (1; 2, =l, 0) À. (0, —5, 6, —3) 


set, these vectors form an orthogonal basis for Rt 
=1x0+2~x ( 5) + ( 1)x6+0x ( 3) by Corollary C33. 


=0-10-—6+4+0 
Te Solution to Exercise C76 
(b) (1, 2,3,4, 5, 6) ° (3,2, 1,0, =1, —2) We apply Strategy C12. 
=1x3:4+2x24+3x14+4x0 
+5*x (—1) +6~x (—2) Let VAS (1,2, 1,0), V3 = =LI =L; 1); 
=3+4+3+0-5-— 12 V3 = (1,0,—1,0), v4 = (1,—1,1,3) and 
-7 u = (1,2,3,4). Then 
vy’ u 4 
: s a1 = = = = = 
Solution to Exercise C74 vitvi 6 3 
We check that each pair of vectors is orthogonal by as = vou 2 = 1 
forming the scalar product of each pair of vectors v2tv 4 2 
in the set: Oo vyu -2 | i 
(1,0,0,0,0) - (0,2,0,0,0) =0+0+04+0+0 Oo any 
= 0, ovpu 4 7 
(1,0,0,0,0) - (0,0,1,1,0) =0+0+0+0+0 A View «CO 
= 0, Thus 
(0,2,0,0,0) - (0,0,1,1,0) =0+0+0+4+0+0 (1,2,3,4) = $(1,2, 1,0) + 4(—1, 1, —1,1) 
=). (1,0,—1,0) + (1, -1, 1,3). 
Therefore these three vectors form an orthogonal : . 
in R5 Solution to Exercise C77 
set in R°. 
. í (a) Using x-n = 0, we have 
Solution to Exercise C75 
(x,y, z)» (3, —4,5) = 0; 
We check that each pair of vectors is orthogonal by 
forming the scalar product of each pair of vectors that is, the equation of the plane is 
in the set: 3x — 4y + 5z = 0. 
(1,2, 1,0) -(—1,1,-1,1) =-1+2-—1+0 
sü (b) We have 
wa Loor L Oe) 
a =12—12+0=0, 
(1,2,1,0) < (1,—1,1,3)=1-2+1+0 oen 


w2» n = (0,5, 4) + (3, —4, 5) 
= 0 — 20 +20 = 0, 


so both these vectors lie in the plane. 


=0, 
(—1,1,—1,1)- (1,0,—1,0) =—1+0+1+0 


Cat Aye Tid e= (Alternatively, rather than using the vector 
equation of the plane, we can check that the points 
? (4,3,0) and (0,5,4) satisfy the equation 
(1,0,—-1,0)-(1,-1,1,8) =1+0-1+0 3x — 4y + 5z = 0 of the plane.) 
= 0. 


194 


Solutions to exercises 


(c) We set vı = (4,3,0) and Thus we have the orthogonal basis 
Vo = wo Myj {(1,0,0, 0,0), (0,2, 0, 0,0), (0,0, 1, 1,0), 
ve (6,0,0;0,1), (0,0,—4,4,0)). 
— (0,5, 4) — 43:0) 0.5.4) 4 5 4) 
(4, 3, 0) - (4, 3, 0) Solution to Exercise C79 
= (0,5, 4) — 35 (4,3,0) (a) (3,—4,5)- (3,-4,5) =9 + 16 +25 
= (0,5, 4) — 3(4,3,0) = 50, 
= (-12, 18 4), so |(3, —4, 5)| = v50 = 5v2. 
The required orthogonal basis for the plane is (b) (1,2,—1,0,3) - (1,2,—1,0,3) 
{(4,3,0), (—22, 19, 4)}. =14+4+1+0+9 


= 15, 


. 3 . 
(d) An orthogonal basis for R? is so |(1,2, -1,0,3)| = VIB. 


{(3, —4, 5), (4,3, 0), (-#, #,4)}. 


Solution to Exercise C78 


Solution to Exercise C80 


If v = (v1, v2,...,Un) is a non-zero vector, then 
We apply Theorem C35 with wı = (1,0,0,0,0), P m B 7 
n 
wə = (0,2,0,0,0), w3 = (0,0, 1, 1,0), iv] = (a SS) , 


wa = (1,1,1,1,1) and ws = (1,0, —1,0, 1). 
Since w1, w2 and w3 already form an orthogonal so the magnitude of v/|v| is 
set, we have 

vı = wy = (1,0, 0,0,0), 

v2 = w2 = (0, 2,0,0,0), 

v3 = w3 = (0,0, 1,1,0). 


Then 
(= — (= za) 
v4 = W4 — vı — 
Viev V2° V Š š 
n os Solution to Exercise C81 
= (==) v3 We apply Strategy C13. 
V3° V3 
We have 
= (1,1,1,1,1) — (1,0,0,0, 0) I(1, 2, 1,0)| = V6, 
2 2 
_ 7(0, 2, 0, 0, 0) _ 5 (0,0, 1, 1,0) \(—1, 1,-1, 1) = V4 = 2, 
= (0,0,0,0, 1) |(1,0, -1,0)| = V2, 
auc v w v w \(1,—1,1,3)| = 12 = 2V3. 
1° W5 2° W5 
V5 = W5 ( ) vi ( ) v2 The required orthonormal basis for R4 is therefore 
Vievi V25 V2 
1 1 1 
z ' — (1,2,1,0), =(—1,1,—1, 1), — (1,0, —1,0), 
7 (= w) vy (= =) y, f ) 5 ) Va! 
V3 ° V3 V4 V4 1 a -1,1,3)} 
= (1,0,-1,0,1) = 44,0, 0,0,0) 2V3 
— 0—(—$)(0,0, 1, 1,0) — 10,00,01) 
= (0.0,-$.4,0). 


195 


Unit C3 
Linear transformations 


1 Introducing linear transformations 


Introduction 


In this unit you will study functions between vector spaces. You will begin 
by looking more closely at some particular functions that have R? as their 
domain and codomain, such as rotations and reflections. These functions 
map parallel lines to parallel lines, preserve scalar multiples and map the 
zero vector to itself. Algebraically, these functions preserve the operations 
of addition and scalar multiplication in the vector space R?. There is a 
special name for functions that preserve addition and scalar multiplication 
between vector spaces: they are called linear transformations. You will see 
that such functions have a matrix representation. This link between linear 
transformations and matrices enables us to relate the properties of 
matrices with those of linear transformations. Finally, you will meet an 
important result concerning linear transformations, known as the 
Dimension Theorem. This theorem has a number of consequences. For 
example, it enables us to show how the number of solutions of a system 

of m linear equations in n unknowns depends on the values of m and n. 


Many results from Units Cl Linear equations and matrices and C2 Vector 
spaces are used in this unit; so make sure that you understand the main 
ideas of those units before starting your study of this one. 


1 Introducing linear transformations 


In this section you will see that we can generalise properties of functions 
that have R? as their domain and codomain to functions between other 
vector spaces. 


1.1 What is a linear transformation? 


We begin by investigating the properties of some simple but important 
functions, often called transformations, which map the vector space R? to 
itself. For each one, a diagram shows the effect of the transformation on 
the square whose corners are at (0,0), (0,1), (1,1) and (1,0), and the 
effect on the vector (1,1); part of the square is shaded for clarity. 


We will investigate the following four functions: dilation, scaling, rotation 
and reflection. 


For any real number k, a k-dilation of R? scales (or stretches) vectors by 
a factor k with respect to the origin. 


199 


Unit C3 Linear transformations 


When k = 2, the magnitude of a vector is doubled, as illustrated in 
Figure 1. 


(2,2) 


Figure 1 A 2-dilation 


When k = $, the magnitude of a vector is halved, as illustrated in 
Figure 2. 


y 


Figure 2 A 3-dilation 
When k is negative, the direction of a vector is reversed — as illustrated in 
Figure 3 for the case k = —2. 

y 


(1,1) 
k = —2 
—_ 


(=2;=2) 


Figure 3 A —2-dilation 


200 


For any real numbers k and l, a (k,1)-scaling of R? scales vectors by a 
factor k in the x- A and by a factor l in the y-direction. Figure 4 


shows the effect of a (2, (2,5) )-scaling. 


y 
(2,3) 


Figure 4 A ( )-scaling 


Figure 5 shows the effect of a (—1,3)-scaling. 


(1, 3) 


a D 


Figure 5 A (—1,3)-scaling 


A rotation rg of R? rotates vectors anticlockwise through an angle 8 


about the origin (0, 0). 


Figure 6 shows the effect of a rotation rz /4. 


la <p 


Figure 6 A rotation rz/4 


Figure 7 shows the effect of a rotation rz /2. 


a= N, > 


Figure 7 A rotation rz/2 


Introducing linear transformations 


201 


Unit C3 Linear transformations 


202 


A reflection qq of R? reflects vectors in the straight line through the 
origin that makes an angle ¢ with the z-axis (measured anticlockwise). 


Figure 8 shows the effect of reflection q;/4. 


y 


Figure 8 A reflection qr/4 
Figure 9 shows the effect of a reflection q7/2. 


y y 
(1,1) (=1;1) 
og=n/2 


Figure 9 A reflection qz/2 


Exercise C82 


For each of the following functions, draw a diagram to show the effect of 
the function on the rectangle with corners at (0,0), (2,0), (2,1) and (0,1), 
and on the vector (2,1). State whether the function is a dilation, a scaling, 
a rotation or a reflection. 


(a) t: R? — R? (b) t: R? — R? (c) t: R? — R? 
(x,y) > (2x, 3y) (x,y) > (x, —y) (x,y) >? (—y, x) 


We now use matrix multiplication, from Unit C1, to obtain algebraic 
definitions of the four types of function defined geometrically above: 
dilation, scaling, rotation and reflection. 


A k-dilation of R? maps (x,y) to (ka, ky). This can be represented by 


GW) )G)= (a), 


A (k,1)-scaling of R? maps (x,y) to (ka, ly). This can be represented by 


DEDO- 


1 


An algebraic definition for a rotation rg of R? can be obtained by 
considering Figure 10, where rg maps (x,y) to (2’,y’). 


Figure 10 A rotation rg (through an angle of 8). 
It can be seen that 
(x,y) — (2’,y’) = (xcosé — ysin 0, x sin 0 + y cos 0). 
This can be represented by 
x cos@ —siné\ (x x cos — ysin 0 
> è = ; š 
y sinf  cos@) \y xsin + ycosé 
For example, r/¢ can be represented by 
3 1 3 1 
(C) = = =7 (5) 7 Se- iy 
1 3 1 3 
d A a po + By 
Similarly, it can be shown (you will show this in Exercise C88) that a 
reflection qe of IR? can be defined algebraically by 


NI 


(x,y) — (z cos 2¢ + ysin 2¢, x sin 2¢ — y cos 2¢). 
This can be represented by 


x cos 2 sin2¢\ (x\ _ (xcos2¢+ ysin2¢ 
y = sin2@ —cos2¢/ \y/ \xsin2d—ycos2¢/) ` 


For example, ¢,/¢ can be represented by 


z MO (22+ Sy 
r V3 1 y Nal 1 i 
i 2 Ta ge ou 


Introducing linear transformations 


We have seen that each of the four types of function can be represented by 


Gi Ga eer 


for some real numbers a, b, c and d. 


203 


Unit C3 Linear transformations 


204 


The existence of a matrix representation is not the only property shared by 
these functions of the plane: they also share several striking geometric 
properties. In each of the examples, the image of the unit square is either a 
square or a rectangle; each of these functions maps straight lines to 
straight lines — indeed, each maps parallel lines to parallel lines. Any 
function that maps parallel lines to parallel lines will map parallelograms 
to parallelograms. Another geometric property shared by these four 
functions is that they also all map the origin to itself. 


Figure 11 shows the effect of a general transformation t on two vectors, v1 
and v2, where t maps parallelograms to parallelograms and preserves the 
origin. 


t(vi + v2) 
YA = t(v1) + t(v2) 
á Vi + V2 
v2 t(v1) 
t > t(v2) 
Vi 
7 > 


Figure 11 Parallelograms are mapped to parallelograms 


Bearing in mind the Parallelogram Law for addition of vectors from 
Unit Al Sets, functions and vectors, this illustrates that for each function t 
in one of the four classes above, we have 


t(vi + v2) = t(v1) + t(v2), for all v1, v2 € R?. 


Such a function t also preserves scalar multiples, as illustrated in 
Figure 12; that is, if wv is a scalar multiple of a vector v, then the image 
of av under t is a scalar multiple of the image of v under t. 


YA av YA 


Rv 


> 
T 
Figure 12 Scalar multiples are preserved 


We have 
t(av) =at(v), for allv € R’, aER. 


We use these two algebraic properties to define a linear transformation 
from any vector space to another: a linear transformation is any function 
from a vector space V to a vector space W that has these two algebraic 
properties. You will see why these functions are called linear 
transformations in Subsection 1.3. 


1 Introducing linear transformations 


Definition 
Let V and W be vector spaces. A function t: V —> W is a linear 
transformation if it satisfies the following properties. 


LT1 t(vi + ve) =t(vi) +¢(ve), for all vi, v2 E€ V. 
LT2 t(av)=at(v), foralveV,aeER. 


In Section 2 we show that the functions between finite-dimensional vector 
spaces that have these two properties are precisely those functions that 
have matrix representations. 


Suppose that t : V —> W is a linear transformation. It follows from 
property LT1 that if we know the images of two vectors vı and v2 under t, 
then we can find the image of the vector vı + v2. It follows from property 
LT2 that if we know the image of a vector v under t, then we can find the 
image of any scalar multiple of v. 


Thus, once we know the images of some vectors, we can find the images of 
more vectors by applying properties LT1 and LT2. In fact, if we know the 
image of each vector in a basis for V, then we can find the image of every 
vector in V. It is this property that makes linear transformations so 
important; we will prove it at the end of this section. 


All the functions of the plane that we have studied map the origin to itself. 
In fact, any linear transformation t : V —> W maps the zero vector of V 
to the zero vector of W. To see this, we use property LT2: 


t(0) = t(00) = 0t(0) = 0. 


We have proved the following result. 


Theorem C37 


Let t: V — W bea linear transformation. Then t(0) = 0. 


It follows from Theorem C37 that a function t: V —> W where t(0) 4 0 is 
not a linear transformation; for example, the function 


t: R? — R? 
(x,y) — y= 1,2) 
is not a linear transformation because 


t(0) = t(0,0) = (—1,0) £0. 


205 


Unit C3 Linear transformations 


206 


However, a function t with the property t(0) = 0 is not necessarily a linear 
transformation. For example, the function 


t: R? — R? 
(x,y) > (x, lyl) 


satisfies t(0) = 0 but is not a linear transformation. To see this, consider 
the two vectors (0,1) and (0,—1). LT1 is not satisfied because 


t((0, 1) + (0, —1)) = ¢(0, 0) = (0,0) 
and 

(0,1) + (0, —1) = (0,1) + (0,1) = (0, 2). 
LT2 is also not satisfied; this can be shown, for example, by taking the 
vector (0,1) and a = —1. 


The following strategy can be used to test whether a given function is a 
linear transformation. 


Strategy C14 
To determine whether or not a given function t: V —> W is a linear 
transformation, do the following. 
1. Check whether t(0) = 0; if not, then t is not a linear 
transformation. 
2. Check whether t satisfies the following two properties. 
LT1 t(vi+ve) =t(v1) +t(v2), for all vı, v2 E V. 
LT2 t(av)=at(v), foralveV,aeER. 


The function t is a linear transformation if and only if both these 
properties are satisfied. 


You may have noticed that if the two properties in step 2 of the strategy 
both hold, then ¢ is a linear transformation and we do not also need to 
check step 1. We have, however, included step 1 in the strategy as this can 
provide a quick way of showing that some functions are not linear 
transformations. On the other hand, if step 1 holds but either one of 
properties LT1 or LT2 fails, then you do not need to check the other. 


Worked Exercise C49 


Use Strategy C14 to determine whether or not each of the following 
functions is a linear transformation. 


(a) t:R? — R? (b) t: R? — R? 
(x,y) — (22, y) (x,y) ((a+y)?, 9’) 


Solution 


(a) ©. You may notice that t is a (2,1)-scaling, and so expect it to 


be a linear transformation. & 
Here t(0) = 0, so t may be a linear transformation. 
Next we check whether t satisfies LT1: 

t(v1 + v2) =t(v1) +t(ve), for all vı, v2 € R°. 
In R?, let vı = (71, y1) and v2 = (x2, y2). Then 


t(vi + v2) = t(v1 + 22, y1 + y2) 
= (2(x1 + £2), yi + Y2) 
= (2x1 + 2x2, y1 + y2) 
and 
t(vi) + t(v2) = (2x1, y1) + (2x2, y2) 
= (241 + 2x2, Y1 + y2)- 
These expressions are equal, so LT 1 is satisfied. 
Finally, we check whether t satisfies LT2: 
t(av) =at(v), forall v €R?, a €R. 
Let v = (x,y) be a vector in R? and let a € R. Then 
t(av) = t(az, ay) = (2az, ay) 
and 
wa Y) = ain) = em) = ox, a 
These expressions are equal, so LT2 is satisfied. 
Since LT1 and LT2 are satisfied, t is a linear transformation. 
Here t(0) = 0, so t may be a linear transformation. 
Next we check whether t satisfies LT1: 
t(vi + v2) = t(v1) + t(v2), for all v1, v2 € R?. 
In R?, let vı = (21, y1) and v2 = (z2, y2). Then 
t(vi + vo) = t(£1 + £2, Y1 + Y2) 
= ((x1 + £2 + y1 + y2)”, (y1 + Y2)”) 
and 
t(v1) + t(v2) = E 92)”, ¥3) 
= ((@1 + y1)” + (z2 + yo), 7 + 99). 


1 


Introducing linear transformations 


These expressions are not equal in general, so LT1 is not satisfied. 


Thus t is not a linear transformation. 


®. Since property LT1 is not satisfied, there is no need to check 


property LT2; however, in this case it also does not hold. © 


207 


Unit C3 Linear transformations 


Exercise C83 


Use Strategy C14 to determine whether or not each of the following 
functions is a linear transformation. 


(a) t:R? — R? (b) ¢:R? — R? 
(x,y) > (x + 3y, y) (x,y) > (x +2,y + 1) 


In Exercise C83(a) you showed that the function 
t: R? — R? 
(x,y) > (x + 3y, y) 


is a linear transformation. This function is an example of a shear, or skew, 
2 
of R4. 


As illustrated in Figure 13, in general, a shear of R? in the x-direction by 
a factor k is the linear transformation 


t:R? — R? 
(x,y) — (x + ky, y). 


y y 
(1,1) (1+k,1) 


x x 

Figure 13 A shear in the x-direction by a factor of k 
In Exercise C83(b) you showed that the function 

t: R? — R? 

(x,y) — (z +2,y +1) 
is not a linear transformation. This function is an example of a translation 
of R?. 
As illustrated in Figure 14, in general, a translation of R? by (a,b) is the 
function 

t: R? — R? 

(x,y) — (z +a, y +b). 


(1+a,1 +b) 


(a, b) 


Figure 14 A translation by (a,b) 


208 


1 Introducing linear transformations 


A translation is not a linear transformation unless a = b = 0, since 
otherwise it does not map the origin to itself. 


1.2 Examples of linear transformations 


You have seen many examples of functions from R? to R?. In general, 
given any two vector spaces V and W, we can define functions from V 
to W. For example, consider the function t from R? to R? that projects 
each vector in RÌ onto the (x, y)-plane, as illustrated in Figure 15: 


t: RÈ? — R? 
(x,y,z) — (x,y): 


This function is a linear transformation as shown in the next worked 
exercise. 


Worked Exercise C50 


Show that the following function t from R? to R? is a linear transformation. 
t: R? — R? 
(x,y,z) — (x,y) 


Solution 


®@. Note that the question says ‘show’, not ‘determine’; we know that 
it is a linear transformation. Thus we use the definition rather than 
Strategy C14 and avoid the need to check whether ¢(0) =0. © 


First we show that t satisfies LT1: 
t(vi + v2) = t(v1) + t(v2), for all vı, v2 € R®. 
In R, let vi = (z1, 91, 21) and v2 = (£2, Y2, 22). Then 
t(v1 + v2) = t(21 + 22,41 + yo, 21 + 22) 
= (x1 + £2, y1 + Y2) 
and 
t(vi) + t(v2) = t(£1, y1, 21) + t(Le2, Y2, 22) 


= (21,91) + (#2, ye) 
a) 


These expressions are equal, so LT 1 is satisfied. 


Next we show that t satisfies LT2: 


t(av) =at(v), forall v €R, aER. 


Figure 15 A projection from 


R? onto the (x, y)-plane 


209 


Unit C3 Linear transformations 


Let v = (x,y,z) be a vector in R? and let a € R. Then 
t(av) = t(az, ay, az) = (ax, ay) 

and 
iW) = be, e) = ae, Uy) = (aa, ay): 

These expressions are equal, so LT 2 is satisfied. 


Since LT1 and LT2 are satisfied, t is a linear transformation. 


Worked Exercise C51 


Determine whether or not the following function is a linear transformation. 
t: Rt — R? 


(x,y,z, w) —> (zy, z) 


Solution 
®@. The question says ‘determine’, so here we do use the strategy. © 
We use Strategy C14. 
Since ¢(0) = 0, t may be a linear transformation. 
Next we check whether t satisfies LT1: 

t(vi + v2) = t(v1) + t(v2), for all vı, v2 € R*. 
In R*, let vi = (£1, Y1, 21,W1) and v2 = (£2, yo, z2,w2). Then 

t(vy + v2) = t(£1 + T2, Y1 + Y2, 21 + 22, W1 + We) 

= ((x1 + £2)(y1 + y2); 21 + 22) 

and 

t(vi) + t(v2) = (z141, 21) + (£292, 22) 

= (iy + T242, 21 + 22). 

Since (a1 + r2)(y1 + y2) Æ Viy1 + T242 in general, LT1 is not satisfied. 


Thus t is not a linear transformation. 


Exercise C84 


Determine whether or not each of the following functions is a linear 


transformation. 
(a) t: R? — Rt (b) t:R — R 
(x,y) — (£, y, £, y) (x,y,z) — z? 


(c) t: R? — Rt 
(x,y,z) > (a, y, 2, 1) 


210 


1 Introducing linear transformations 


In the previous subsection we gave an algebraic definition of a rotation i 
of R?. Similarly, a rotation of R? in an anticlockwise direction about the elo 
z-axis through an angle 0, as illustrated in Figure 16, is given by 
t: R? — R? y 
(x,y,z) —+ (xcosé — ysin 0, x sin 0 + y cos 0, z). a 
: Figure 16 A rotation about 
Exercise C85 the z-axis through an angle 6 


Show that the following function t is a linear transformation. 
t: R? — R? 
(x,y,z) (zcos8 — ysin 0, xsin 8 + ycos8@, z). 


So far we have considered functions t : V —» W where V = R” and 
W = R” for some m,n € N. There are, however, many functions between 
other types of vector space. 


Recall from Unit C2 that the vector space P, is the set of all polynomials 
of degree less than n, so 


P; = {p(x) : p(z) =a+ bz + cx’, a,b,c € R}, 
P> = {p(x) : p(z) =a+bz, a,b € R}. 


Worked Exercise C52 


Consider the function that maps each polynomial p(x) = a + bz + cx? in P3 
to its derivative p'(x) = b + 2cx in Py: 


t: Po — Pa 
p(w) — p'(2). 


Determine whether or not this function is a linear transformation. 


211 


Unit C3 Linear transformations 


212 


Exercise C86 


Consider the function t from P; to itself obtained by adding to each 
polynomial p(x) = a + bx + cx? in Pz the number p(2) = a + 2b + 4c: 
t: PB — Ps 
p(x) —+ p(x) + p(2). 


Determine whether or not this function is a linear transformation. 


There are also linear transformations of infinite-dimensional vector spaces. 
For example, let V be the vector space of all real functions. An argument 
similar to that in the solution to Exercise C86 shows that the following 
function is a linear transformation: 


t:V—74V 
f(x) — f(x) + f(2). 


Zero transformation 


Since every vector space contains a zero vector, given any two vector 
spaces V and W, there is a particularly simple function mapping each 
vector in V to the zero vector in W: 


t:V—7W 
vr 0. 


This function is a linear transformation. To show this, we first show that t 
satisfies LT1: 


t(vi + v2) = t(v1) + t(v2), for all v1, v2 € V. 
Let vı, v2 € V. Then vj + vo is also in V, so 


t(vi + v2) =0 


1 Introducing linear transformations 


and 
t(vi) +t(v2) =0+0=0. 
So LT1 is satisfied. 
Next we show that t satisfies LT2: 
t(av) =at(v), forallveV, aER. 
Let v € V anda € R. Then av is also in V, so 
t(av) =0 
and 
at(v) =a0=0. 
So LT2 is satisfied. 


Since LT1 and LT2 are satisfied, t is a linear transformation. 


Definition 
The zero transformation from V to W is the linear transformation 


t:V = wW 
v —> 0. 


Identity transformation 


Given a vector space V, there is another particularly simple function, this 
time from V to itself, mapping each vector in V to itself: 


iv: V — V 
Vt V. 


Exercise C87 


Show that the function iy is a linear transformation. 


Definition 
The identity transformation of V is the linear transformation 
thy & V — V 
VEO v. 


We omit the subscript V when the vector space is clear from the context. 


213 


Unit C3 Linear transformations 


1.3 Linear combinations of vectors 


Recall from Unit C2 that a linear combination of the vectors v1,...,Vn is 
an expression of the form ayvj +--+ + nVn, where a1,...,@n E R. We 
end this section by proving that linear combinations of vectors are 
preserved under a linear transformation; that is, if v is a given linear 
combination of vectors v;, then the image of v is the same linear 
combination of the images of the vectors v;. This explains why these 
functions are called linear transformations. In fact, some texts use this 
theorem as the definition of a linear transformation. 


Theorem C38 


A function t : V —> W is a linear transformation if and only if it 
satisfies 


LT3 t(ayvi =F a2V2) = ayt(v1) aF agt(v2), 


for all v1, V2 € V and all a1, @2 E€ R. 


Proof ®. We start by proving the ‘only if’ part using LT1 and LT2 to 
show that a linear transformation satisfies LT3. .@ 


If a function t : V —> W is a linear transformation, then it satisfies LT1 
and LT2. We show that this implies that it satisfies LTS. 


Let vi, v2 E€ V and ay,a2 E€ R. Then it follows from LT1 that 
t(ayv1 + Q2Vv2) = t(a1Vv1) + t(a2ve), 

and from LT2 that 
t(ayvi) + t(agve2) = ait(vi) + agt(ve). 

So t satisfies the property LT3: 


t(ayvy + a2V2) = ayt(v1)+agt(v2), 
for all v1, v2 E€ V and all ay,a2 E€ R. 


@. We now prove the ‘if’ part using property LT3 to show that LT1 and 
LT2 are satisfied. & 


Suppose that a function t: V —> W satisfies property LT3. Then it also 
satisfies LT1 and LT2, since 


t(vi + ve) =t(v1) + t(v2), for all vi, v2 E V, 
is a special case of LT3 with a, = ag = 1, and 


t(av) =at(v), forallveV, aER, 


is a special case of LT3 with vg = 0, vı = v, a] =a and a2 = 0. 


Thus a function is a linear transformation if and only if it satisfies 
property LT3. Oo 


214 


1 Introducing linear transformations 


We now prove that linear combinations of any number of vectors are 
preserved under a linear transformation. 


Theorem C39 
Let t: V — W bea linear transformation. Then 
t(aivi + a2V2 + +++ + OnVn) = ait(vi) + azt(v2) +++: + ant(vn), 


for all v1,...,Vn E€ V and all qj,...,a@, ER, n EN. 


Proof ®. We use proof by mathematical induction as in Unit A3 
Mathematical language and proof and start by writing out clearly what we 
take P(n) to be. @ 


Let P(n) be the statement 
t(ayv1 + Q2V2 + +++ +OAnVn) = artvi) + agt(ve) +--+ ant(vn), 
for all v1,...,Vn E€ V and all aj,...,a, E R. 
@®. Next, we carry out step 1; that is, we check that P(1) holds. #@ 
Since t is a linear transformation, LT2 is satisfied, so 
t(ayvi) = aıt(vı), forall vı E V, a ER. 
Thus P(1) is true. 


®. Now we proceed with step 2. We start by stating clearly our 
assumption, P(k). & 


We assume that P(k) is true for some positive integer k; that is, 
t(ayvy + a2V2 +--+ +0%V_) = a1t(vi) + agt(ve) +--+ + agt(Vg), 
for all v1,..., Vk E€ V and all ay,...,a, E R. 
®. We state clearly our desired conclusion, P(k + 1). ® 
We wish to deduce that P(k + 1) is true; that is, 
t(ayV1 + a2V2 +-+: + KVR + Qk+1Vk+1) 
= ayt(v1) + agt(v2) +++ + agt(ve) + agsit(ves1)- 
Now, v1,...,Vķ41 € V and all ay,...,ax%41 E€ R. We have 
t(aqvy + agvg + +++ + AkVk + Qk+1Vk+1) 
= t((a1V1 + a2V2 + +++ + AkVk) + Qk+1Vk+1) 
= t(aivi + agave + +++ + anvE) +t(ak+1Vk+1) (by LT1) 
= t(ayvy + Q2V2 +-+: + OKVE) + akyit(Vk+1) (by LT2) 
= a t(v1) + azt(v2) +--- + akt(vk) + ak+1t(Vk+1) (by P(k)). 


@. We have proved that P(k) > P(k+1). # 
Thus P(k) = P(k +1), for k = 1,2,.... 


Hence, by the Principle of Mathematical Induction, P(n) is true for all 
neN. a 


215 


Unit C3 Linear transformations 


(1,0) 


Figure 17 The rotation rg 


216 


T 


Theorem C39 is an important result. It means that, given a linear 
transformation t: V —> W and the images of each of the vectors in a 
basis for V, we can determine the image of any vector in V. 


Consider the linear transformation rg that rotates each vector in R? 
anticlockwise through an angle 0 about the origin, as illustrated in 

Figure 17. The standard basis for R? is {(1,0),(0,1)}. From Figure 17, we 
can check that 


re(1,0) = (cos0,sin@) and reg(0,1) = (— sinð, cos 0). 
We now write each vector (x,y) in R? in the form 

(x,y) = x(1,0) + y(0, 1), 
so, from Theorem C39, 


ralz, y) = re(a(1, 0) + y(0, 1)) 
ara, 0) T yro(0, 1) 
= x(cos 0, sin 0) + y(— sin 9, cos 0) 


(x cos @ — ysin 0, xsin 0 + y cos 0). 


This method of finding an algebraic definition for rg is simpler than the 
geometric approach used in Subsection 1.1 and is more generally 
applicable. 


Exercise C88 


Find the image of a vector (x,y) in R? under the reflection qo, given that 
qe(1,0) = (cos 2¢, sin 2¢) and qg(0, 1) = (sin 2¢, — cos 2¢). 


2 Matrices of linear transformations 


In this section you will see how the images of basis vectors can be used to 
find the matrix representation of a linear transformation. 


2.1 Finding matrix representations 


In Section 1 you met several examples of matrix representations of linear 
transformations. For example, you saw that a k-dilation of R? can be 
represented by 


C) = e d O = C) 


and a rotation rg of R? can be represented by 


T\ cosð —sin@\ /x\ _ /xcosð — ysinð 
y sin 0 cosð) \y}  \zsinf +ycosð)` 


2 Matrices of linear transformations 


In this section we show that any linear transformation t : V —> W 
between finite-dimensional vector spaces has a matrix representation 


or 
vi > Av’ = w”. 

Matrix representations are important because they are an aid to 

performing calculations with linear transformations; in particular, they are 

easily handled by computers. 


You have seen that it is sometimes convenient to use a non-standard basis 
E = {e1,..., en} for a vector space V. Recall from Unit C2 that if v is a 
vector in V and 


V = VIE] +++ + Unen, 


then the numbers v1,...,Un are the coordinates of v with respect to the 
basis E (the E-coordinates of v). The E-coordinate representation of v is 
Vir = (U5 VaR: 


For example, let E be the basis {(1, 1), (1,0)} for R?. The vector v = (5, 2) 
in R? can be written as 


v = 2(1,1) +3(1,0), 
so the E-coordinate representation of v is 
VE = (2, 3) Bp: 


For another example, consider the basis E = {1 + x”, x?,2 — x} for the 
vector space P3. As 


tgp +2z° = 3(1 +2?) = = (= a), 


the E-coordinate representation of the polynomial 1 + 2 + 2z? is 
(3,—1,—1)g. 

The following exercises should remind you how to write a vector in terms 
of its coordinates with respect to a given basis. 


Exercise C89 


Find the E-coordinate representation of the vector v = (3,1) in R? for 
each of the following bases E for R?. 


(a) E= {(3,1),(2,1)} (b) B= {(1,2), (2,1)) 


217 


Unit C3 Linear transformations 


218 


Exercise C90 


Find the E-coordinate representation of the polynomial p(x) = 2 + 3x in 
P> for each of the following bases E for P». 


(a) E = {1,x} (the standard basis) (b) E = {1,4 + 62} 
(c) E = {2x,1+ 4r} 


We now define a matrix representation of a linear transformation between 
finite-dimensional vector spaces, with respect to specified bases. 


Definition 
Let V and W be vector spaces of dimensions n and m, respectively. 
Let t : V — W bea linear transformation, let E = {e1,...,en} be a 


basis for V, let F = {f1,... , fm} be a basis for W and let A be an 
m x n matrix such that 


t(v)r = Avg, for each vector v in V. 


Then vg +—> Avg = t(v)p is the matrix representation of t with 
respect to the bases E and F, and A is the matrix of t with respect 
to the bases E and F. 


Remarks 


1. A matrix of a linear transformation from an n-dimensional vector space 
to an m-dimensional vector space is an m x n matrix, not an nX m 
matrix as you might expect. 


2. Strictly speaking, since we defined vectors as row vectors, we should 
write vl => Av} = iw) a However, we omit the transpose symbols 
for simplicity, and we often write these vectors as row vectors to save 
space. 


3. When E = F, we refer to the matrix representation with respect to the 
basis FE. 


Later in this section we will prove that there is exactly one matrix of t 
with respect to the bases E and F, but first we develop a strategy 
(Strategy C15) for finding the matrix of a linear transformation. 


2 Matrices of linear transformations 


Matrix representations using standard bases 


We start by considering linear transformations where both E and F are 
the standard basis. 


Exercise C91 


Each of the following linear transformations t : R? — R? is defined by a 
matrix representation with respect to the standard basis {(1,0), (0, 1)} 

for R?. In each case, find the images of the vectors (1,0) and (0,1). What 
do you notice about the relationship between the vectors t(1,0) and t(0, 1) 
and the 2 x 2 matrix of the linear transformation? 


a) A (3,2)-scaling of R?. b) A rotation r,/4 of R?. 
/ 
t: R? — R? t: R? — R? 


0-630 Q-C VO 


In Exercise C91 you saw two examples in which, given a transformation 
defined by a matrix, the coordinates of the images of the standard basis 
vectors of the domain were the columns of the matrix. It turns out that 
this is always the case, even for non-standard bases: the coordinates of the 
images of the basis vectors of the domain are the columns of the matrix. 
This gives a strategy for finding the matrix of a linear transformation 
between any two finite-dimensional vector spaces with respect to any bases 
for the domain and codomain. 


Strategy C15 


To find the matrix A of a linear transformation t : V —> W with 
respect to the basis E = {e1,e2,...,e,} for V, and the basis 
F = {f, fo,...,fm} for W, do the following. 


1. Find t(e;), t(e2),...,t(en). 


2. Find the F-coordinates of each of these image vectors. 


en) = (ai, a1, a 
t(e2) = (a12, a20,...,am2)F 
t(en) = (ain, an,- tyne) FE 


3. Construct the matrix A column by column. 


a11 a12 Gin 

a21 Q22 a2n 
A= A 

Ami Am2 ‘°° Amn 


219 


Unit C3 Linear transformations 


We first illustrate the strategy with some exercises and then prove that it 
works later in this section. 


Worked Exercise C53 


For each of the following linear transformations t, find the matrix 
representation of t with respect to the standard bases for the domain and 


codomain. 
(a) t: R? — R? (b) t: Po — Po 
(x,y) +> (2x, 32 + y, y) p(x) + p'(x) 
Solution 


(a) We use Strategy C15. 
@®. The standard basis for R? is {(1,0), (0,1)}. & 
We find the images of the vectors in the domain basis 
E = {(1,0), (0, 1)}: 
t(1,0) = (2,3,0), ¢(0,1) = (0,1, 1). 
@. The standard basis for R? is {(1, 0,0), (0, 1,0), (0,0, 1)}. 
There is really nothing to do here when the basis in the 
codomain is the standard basis since the images are already 


expressed with respect to this basis; we show the working here 
for completeness. .® 


We find the F-coordinates of each of these image vectors, where 
IE = IG 0, 0), (0, il, 0), (0, 0, 1)}: 


COV —=(2. 3,0) ry 90,1) = (O00 Ve 


@. We now construct the matrix of t by writing down the 
coordinates of the image vectors column by column — keeping the 
columns in the same order as the corresponding domain basis 
vectors. @ 


Hence the matrix of t with respect to the standard bases for the 
domain and codomain is 


2m0 
A= {3 l 
@ i 


Thus the matrix representation of t with respect to these bases is 


- 2 0 7 PAG 
@ r>13 1 Gl =|3r+y 
E 01 E y : 


®. We have included the subscripts E and F here, but often will 
omit these where the bases are the standard ones. &©& 


220 


2 Matrices of linear transformations 


(b) ®. The standard basis for P; is {1, x, x2}. Therefore the three 
basis vectors and their derivatives are as follows: pı(x) = 1 and 
p\(2) = 0, po(a) = x and p)(x) = 1, ps(x) = 2? and 
p3(a) = 22. # 


We find the images of the vectors in the domain basis 
D= Niama 


D= a e E 22. 


®. The standard basis for P> is {1,2}. We notice that 
0=0+02z, 1=1+4+02 and 27 =0+27. # 


We find the F-coordinates of each of these image vectors, where 
eae 


t(1)=(0,0)r, t(@@)=(1,0)r, t(z°) = (0,2)r. 
®. We keep the columns in this order. © 


Hence the matrix of t with respect to the standard bases for the 
domain and codomain is 


oara 
a=(j 0 I 


Thus the matrix representation of t with respect to these bases 
for P and P» is 


a a 
oTi b 
( 0 >) ow ie) l 
@ 
E E 


C 


@. We have t(a+ bz + cx”) =b + 2cr. B 


Exercise C92 


For each of the following linear transformations t, find the matrix 
representation of t with respect to the standard bases for the domain and 


codomain. 
(a) t: R? — R? (b) t: P — P 

(x,y) > (x + 3y, y) p(x) ++ p(x) + p(2) 
(c) t:R? — R* (d) +¢:R3—>R? 

(x,y) —> (a, y,2,y) (x,y,z) — (x,y) 


221 


Unit C3 Linear transformations 


Matrix representations using non-standard bases 


So far we have used the strategy to find matrix representations with 
respect to standard bases. We now use the strategy to find matrix 
representations with respect to other bases. 


We start with a non-standard basis for the domain and the standard basis 
for the codomain. 


Worked Exercise C54 


Find the matrix representation of the linear transformation 
t: R? — R? 
(x,y,z) —> (x,y) 


with respect to the non-standard domain basis 
E = {(1,1,1), (1,1,0), (1,0,0)} and the standard codomain basis 
F = {(1,0), (0, 1)}. 


Solution 
We use Strategy C15. 


We find the images of the vectors in the domain basis 
Be el) 0) 100): 


f=), OA = (lays =t( 1050) O) 
We find the F-coordinates of each of these image vectors, where 
F = {(1,0),(0,1)}: 

a eee se 0) Ge Da 0D IO ie: 
®. We keep the columns in this order. © 


Hence the matrix of t with respect to the bases E and F is 


igs Oe 
E 1 als 


Thus the matrix representation of t with respect to the non-standard 
basis E for R? and the standard basis F for R? is 


i recone A _ (utr tys 
2 iiO SNE Ne a Eo s 
WY iz U3 


®@, Using v1, vo and v3 instead of x, y and z helps emphasise that 
these are coordinates with respect to a non-standard basis. ©& 


Compare the matrix representation in Worked Example C54 to that found 
in Exercise C92(d) for this linear transformation with respect to the 


222 


2 Matrices of linear transformations 


standard basis in both the domain and the codomain: 


vig ath DOV | ute 
Y o 1 0/7 hay" 
FA z 


In general, different bases give different matrix representations. 


We now consider a non-standard basis in the codomain while keeping to 
the standard basis in the domain. 


Worked Exercise C55 


Find the matrix representation of the linear transformation 
t: R? — RÍ 
(x,y) — (x,y, £, y) 


); (0, 1)} and the non- 
1 


with respect to the standard domain basis EF = {(1,0 
(1,1, 1,0), (1,1, 1, 1)}. 


standard codomain basis F = {(1,0,0,0), (1,1, 0,0), 


Solution 
We use Strategy C15. 
We find the images of the vectors in the domain basis 
E = {(1,0), (0, )}: 
£00} = 0O On = 00 
®. We now write these image vectors in terms of their coordinates 
with respect to the codomain basis — this requires some work! © 


We find the F-coordinates of each of these image vectors, where 
E = 1 (1,000 C 100) (ede TiO) (lene lyr 


For the first image vector, we need a,b,c,d € R such that 
(10, 10) = (00er 
Since 
(a, b,c, d)rp = a(1,0,0,0) + b(1,1,0,0) + c(1,1,1,0) + d(1,1,1,1) 
=(a+b+c+d,b+c+d,c+d,d), 


by equating coordinates we obtain the following system 


a+b+c+d=1 


b+c+d=0 
c+d=1 
ad = 0. 

Solving, we have d= 0, c= 1, b = —1 and a = 1, so 


(10, 1,0) T (p =i 1,0). 


223 


Unit C3 Linear transformations 


Therefore 
¢(1, 0) = (1, =i, O)r. 


@. We have found the F-coordinates of the image of the first domain 
basis vector. If the equations had been more difficult to solve we could 
have used Gauss-Jordan elimination as we did in Unit Cl. © 


For the second image vector, we need e, f,g,h € R such that 
(0,10, D = le ae ty). 
Since 
(e, f,g,h)r = e(1,0,0,0) + f(1,1,0,0) + g(1,1,1,0) + A(1, 1,1, 1) 
=(e+ftgthft+gthgth,h), 
by equating coordinates we obtain the system 


e+ft+tgt+h=0 


geal 
gth=0 
p= 


Solving, we have h = 1, g = —1, f = 1 and e = —1, so 
(0,1,0,1) = (—1,1,-1,1)r. 

Therefore 
#(0, 1) = (—1,1,-1, 1)r. 

@®. We keep the columns in this order. © 


Hence the matrix of t with respect to the bases E and F is 


iL —il 

=I 1 

= I =I 
0 1 


Thus the matrix representation of t with respect to the standard basis 
E for R? and the non-standard basis F for R4 is 


1 =ll Ol = oP 


i) e —1 il Gr Sa + v2 
v2 E 1 —1 v2 E hl "OD 
0 il v2 F 


Compare the matrix representation in Worked Exercise C55 to that found 
in Exercise C92(c) for this linear transformation with respect to the 


224 


2 Matrices of linear transformations 


standard bases in both the domain and codomain: 


r 1 0 £ 
= 01 (5) _|y 

A 1 0 yY £ 
0 1 y 


Finally, we look at an example with non-standard bases for both the 
domain and the codomain. 


Worked Exercise C56 


Find the matrix representation of the linear transformation 
t: R? — R? 
(x,y) + (2x, 32 +y, y) 


,(1,0)} and the 
D- 


with respect to the non-standard domain basis E = {(1, 1) 
non-standard codomain basis F = {(1,1, 1), (0,1,1), (0,0,1 


Solution 
We use Strategy C15. 
We find the images of the domain basis vectors E = {(1, 1), (1,0)}: 
ALD S= 24 O0 S= 20) 
We find the F-coordinates of each of these image vectors, where 
E= A (One Ty Ont ye 
For the first image vector we need a,b,c € R such that 
(2,41) = (00e) 
Since 
(a, b,c)r = A + (0, TEOT) 
= (a,a+b,a+b+ 0c), 


by equating coordinates we obtain the system 


a = 2 
a+b = 4 
a+b+c= 
Solving, we have a = 2, b = 2 and c = —3, so 


(2,451) = (252) = 3 )e. 
Therefore 
#(1,1) = (2,2,—-3)p. 
For the second image vector we need d,e, f € R such that 


(22 0) (des fm 


225 


Unit C3 Linear transformations 


Since 


by equating coordinates we obtain the following system 


d =F 
d+e = 3 
d+e+f=0. 


Solving, we have d = 2,e = 1 and f = —3, so 
(2350) = (2 1, ae 

Therefore 
¢(1,0) = (2,1,-3)p. 


Hence the matrix of t with respect to the bases E and F is 


Bn 
A= 2 1 
=3 = 


Thus the matrix representation of t with respect to the non-standard 
basis E for R? and the non-standard basis F for R? is 


2 2 7 2v1 + 2v9 
| > 2 1 ( ‘) = 2v1 + v2 
E E 


va -3 a) N2 -3v — 3v2) p 


Compare the matrix representation in Worked Exercise C56 to that found 
in Worked Exercise C53(a) for this linear transformation with respect to 
the standard bases in both the domain and codomain: 


. 2 0 P 22 
(G) 3 1 (5) = 3x +y 
yY 0 1 y y 


The following exercise involves both standard and non-standard bases in 
the domain and codomain. 


Exercise C93 


Find the matrix of the linear transformation 
t: R? — R? 
(x,y,z) —> (x,y) 
with respect to each of the following bases E for R? and F for R?. 
(a) E = {(1,0,1), (1,0,0), (1,1, 1)} 
F = {(1,0),(0,1)} (standard basis for R?) 


226 


2 Matrices of linear transformations 


(b) E= {(1,0,0), (0,1,0), (0,0,1)} (standard basis for R3) 
F= {(2, 1), (1, 1)} 
(c) B= {(0, 1, 0), (1, 1, 1), (0, 1, 1)} f= {(1,3), (2, 4)} 


You have seen that a linear transformation t: V —> W has different 
matrix representations depending on the bases used for the domain and 
codomain. Moreover, the order of the elements in a basis is important. For 
example, in the next exercise you should obtain different matrices for t for 
each part: although the bases contain the same elements, the order in 
which they appear in the domain basis is different. 


In summary, note the following two facts. 


e Different bases for V and W give different matrix representations. 


e A different order of basis elements gives a different matrix 
representation. 


Exercise C94 


Find the matrix representation of the linear transformation 
t: Pp — Pz 
p(z) — p'(z) 
with respect to each of the following bases E for P3 and F for P». 
(a) E= {1,%z,x£?} F = {2z,1 +z} 
(b) E= {z,x?,1} F={22,14+ 2} 


The unique matrix representation of a linear 
transformation 


You have seen that the matrix representation of a linear transformation 
depends on the bases for both the domain and the codomain, and the order 
of these basis elements. However, for given ordered basis elements, there is 
precisely one matrix representation: the one given by Strategy C15. Using 
the notation in the strategy, the unique matrix representation of a linear 
transformation t with respect to the bases EF and F is 


V1 ail Q12 o Gin U1 a41U1 +++ + A1nUn 
v2 a21) Q2 +++ «Gan v2 a21V1 + +++ + GanUn 
. = . 
Un E Ami Am2 °*** Amn Un E Am1V1 + +++ + AmnUn F 


227 


Unit C3 Linear transformations 


We now prove this result. If you are short of time, you should skim 
through this proof and come back to it when time permits. 


Theorem C40 


Let t: V — W be a linear transformation, let E = {e1,...,en} bea 
basis for V and let F = {fi,...,fm} be a basis for W. Let 


Ker) = ea Cige san 
e2) = (i C2 Amo) 
t(en) = (aim An,- tha es 


Then there is exactly one matrix of t with respect to the bases Æ 
and F, namely 


aii a12 ai Qin 

a21 a22 a2n 
A= ; 

Ami Am2 ‘** Amn 


Proof ®. We start by showing that A is a matrix of t with respect to 
the (ordered) bases E and F. ® 


Suppose that the conditions of the theorem are satisfied and that 
(v1,...,Un)g is the E-coordinate representation of a vector v € V. Then 
we have 

V = vye, + V2€2 +: + Unen. 
By Theorem C39, linear transformations preserve linear combinations of 
vectors, so 

t(v) = t(vje1 + v2e2 + +++ + Unen) 

= vıt(e1) + vgt(e2) +: + Unt(en) 


= v1 (a11, 421,- - - , Am1 )F + ¥2(G12, 422, . - - , Am2 )F ++: 
-+ ün latns An; äm) P 
= (viai +: + Unan, V021 +: H Unan, wees 
Vimi e Uy Orin li 


So the first coordinate of t(v) is a111 +++: + @inUn, the second coordinate 
of t(v) is ag1¥1 +--+ + a2nUn, and so on. These coordinates can be 
obtained by matrix multiplication as follows 


a111 +++ + A1nUn Q11 Q12 +++) Qin U1 
a2{U1 + +++ + AanUn a21 Q2 +++) Gan v2 
Am1V1 F: + Amnn AmI Am2 ‘** Amn Un 


228 


2 Matrices of linear transformations 


Therefore 
V1 ail a2 +++ Gin v Q111 +*+: + AinUn 
v2 a21 Q22 `° Gan v2 a21V1 + +++ + GanUn 
= = p 
Un E AmI Am2 °*** Amn Un E Am1U1 + +++ F AmnUn F 


is a matrix representation of t with respect to the bases E and F, and 


a11 a12 Qin 

a21 a22 Q2n 
A= i 

Ami Am2 ‘°° Amn 


is a matrix of t with respect to the bases EF and F. 


®@. We now show that A is the only possible matrix of t with respect to 
the (ordered) bases Æ and F. We do this by assuming that there is 
another possible matrix B and concluding that B must be equal to A. & 


Suppose that B is also a matrix of t with respect to the bases EF and F 
where 


bii bi? bin 
boi b22 bon 
B= 
bmı bm2 Pay bmn 
Since e1 is the first basis vector in E, we have e1 = (1,0,...,0)g, and the 
image of e; under t is 
1 bii biz bin 1 b11 
0 b21 bop bon 0 b21 
> a ‘ = , 
0 E bmı bm2 bn 0 E bina E 
that is, 
t(e1) = (b11, b21,- . - , bmi) F- 
However, 
tei) = (a11, 021, - - - , am1)F, 


so the first column of B is equal to the first column of A. 


Similarly, we find that 


t(e2) = (b12, b22, . . - ,bm2)F = (412, 422, . - - , Am2) F, 
t(e3) = (b13, b23, . . - , bm3)F = (413, 423, . - - , Am3) F, 
t(en) = (bin, bon, .+. Onn) = (ain, AMn; ,amn)F. 


Therefore each subsequent column of B is also the same as the 
corresponding column of A. Since A and B are both m x n matrices and 
their corresponding entries are equal, we have B = A. 


Thus A is the only matrix of t with respect to the bases E and F. E 


229 


Unit C3 Linear transformations 


230 


2.2 An equivalent definition 


We have shown that any linear transformation t : V —> W, where V 

and W are finite-dimensional vector spaces, has a matrix representation. 
We now show the converse — that a function that has a matrix 
representation is a linear transformation. We will use the following result 
about matrix multiplication: if A and B are matrices and a a scalar, then 
(aA)B = A(aB), whenever this product exists. You might like to prove 
this result yourself; it is included as a ‘challenging’ exercise in the 
additional exercises booklet for this unit. 


Theorem C41 


Let t: V — W be a function that has a matrix representation. Then 
t is a linear transformation. 


Proof Suppose that the function t : V —> W has a matrix 
representation 


Vet > Ave = t(v)F. 
@. We first show that t satisfies LT1: that for all v1, v2 € V we have 
t(vi + ve) = t(vi) + t(v2). & 
Let vj, v2 E€ V. Then 

t(vi + v2)F = A(vi + v2)g 
and 

t(vi)e +t(v2)F = A(vi)e + A(v2)g 

= A(vı + va)eE, 

by the distributive property for matrix multiplication. 


So t(vı + v2)F = t(vi)r +t(v2)r, and hence t(vi + v2) = t(vi) + t(v2), 
because the F-coordinate representation of a vector is unique. Therefore 
the function t satisfies LT1. 


®. We now show that t satisfies LT2: that for all v € V and a € R we 
have t(av) =at(v). # 


Let v € V and a € R. Then 
at(v)rp =aAvg 
and 
t(av)r = A(av) zp = aAve, 
by the result about matrix multiplication quoted above. 


So t(av) re = at(v)p, and hence t(av) = at(v), because the F-coordinate 
representation of a vector is unique. Therefore the function t also satisfies 
LT2. 


Since both LT1 and LT2 are satisfied, the function t is a linear 
transformation. | 


3 Composition and invertibility 


Theorems C40 and C41 imply the following. 


Corollary C42 


A function t : V — W, where V and W are finite-dimensional vector 
spaces, is a linear transformation if and only if it has a matrix 
representation. 


This means, for example, that the linear transformations from R? to itself 
are those functions that have a matrix representation 


Gl a) G)= (ra): 


So the linear transformations from R? to itself are those functions of the 
form 


t: R? — R? 
(x,y) —> (ax + by, cx + dy) (1) 
for some a,b,c,d € R. 


Similar expressions exist for linear transformations from R” to R”. 


Exercise C95 


Use the linear transformation form (1) to determine which of the following 
functions are linear transformations. 


(a) t: R? — R? (b) t: R? — R? 
(x,y) + (y, 22 +y) (x,y) — (2, y) 
(c) ¢:R? — R? (d) t: R? — R? 
(x,y) > (x, 2xy + y) (x,y) + (3x, x + 4y) 


3 Composition and invertibility 


In this section you will use the matrix representation of a linear 
transformation to find composite linear transformations and investigate 
properties of linear transformations, such as invertibility. 


3.1 Composition Rule 


In the previous section you saw that a function t : V —> W, where V 
and W are finite-dimensional vector spaces, is a linear transformation if 
and only if it has a matrix representation. We now use some of the 
properties of matrices that you met in Unit C1 to develop our 
understanding of linear transformations. 


231 


Unit C3 Linear transformations 


232 


We begin by considering the composition of linear transformations. The 
composite of two functions t: V —> W and s : W —> X is 


sot: V — X 
v — s(t(v)), 


as shown in Figure 18. 


V W X 


Figure 18 The composite sot 


Consider the linear transformations 
t : R? — R? s : R? — R? 
and 
(x,y) +> (x + 2y, y) (x,y) +> (5x, £ + y). 
Let (x,y) € R?. Then 


t(x, y) = (£ + 2y, y), 
SO 
s(x + 2y, y) 
= (5(x + 2y), (x + 2y) + y) 
= (5x + 10y, x + 3y). 


s(t(x,y)) 


Thus the composite function s o t is the linear transformation 
sot:R? — R? 
(x,y) — (5a + 10y, x + 3y). 


In general, for linear transformations s and t from a vector space to itself, 
the composite functions sot and to s are not the same, as you will see in 
the following exercise. 


Exercise C96 


Let p and r be the linear transformations 
p: R? — R? r : R? — R? 
an 
(x,y) (3a + y, =T) (x,y) — (x,£ +y). 
Find the following composite functions. 


(a) rop (b) por 


3 Composition and invertibility 


Each of the composite functions in Exercise C96 is a linear transformation, 
since it has the correct form (1). In the next theorem (Theorem C43) we 
show that composition of two linear transformations always gives a linear 
transformation. 


At the beginning of this subsection we showed that the two linear 
transformations s and t in equation (2) can be composed to give the linear 
transformation 
sot:R* — R? 
(x,y) — (5x + 10y, x + 3y). 
Using Strategy C15, we obtain the matrix representations of these three 
linear transformations with respect to the standard basis for R?: 


t: R? — R? 


EE Eo 
E 
E 


We can check that 


GG i) (0 1): 


so, in this example, 


matrix\ _ / matrix matrix 
of sot) \ ofs oft J` 


We now show that this relationship between the matrices of sot, s and t 
holds in general; that is, that composition of linear transformations 
corresponds to matrix multiplication. If you are short of time, you should 
just look at the structure of this proof and come back to it when time 
permits; part (a) checks the properties LT1 and LT2, and part (b) 
constructs the matrix of the composite linear transformation. To help 
visualise what is going on, the composite s o t, vector spaces and bases are 
shown in Figure 19. 


A 
t 
-> 
sot 
BA 
Bases 
E F 


B 


G 


Figure 19 The composite 


s o t showing the vector spaces 


and bases 


233 


Unit C3 Linear transformations 


234 


Theorem C43 Composition Rule 
Let t: V — W and s : W —> X be linear transformations. Then: 
(a) sot: V —> X isa linear transformation 


(b) if A is the matrix of t with respect to the bases E and F, and B 
is the matrix of s with respect to the bases F and G, then BA is 
the matrix of sot with respect to the bases E and G. 


Proof Lett: V — W and s: W —> X be linear transformations. 


(a) ®. We first show that sot satisfies LT1: that for all vi, v2 € V we 
have (s o t)(vy + v2) = (s o t)(v1) + (so t) (v2). & 
Let v1, v2 E€ V. Then, since t and s both satisfy LT1, we have 
(so t)(vi + v2) = s(t(vı + v2)) 
= s(t(vi) + t(v2)) 
= s(t(vi)) + s(t(v2)). 
We also have 
(s o t)(v1) + (so t)(v2) = s(t(v1)) + s(t(v2)). 


So (sot)(v1 + v2) = (sot)(v1) + (s o t)(v2). Therefore the composite 
sot satisfies LT1. 


®. We now show that sot satisfies LT2: that for all v € V, a € R we 
have (sot)(av) = a(s o t)(v). ® 


Let v € V and a € R. Then, since t and s both satisfy LT2, we have 
(s o t)(av) = (t(av)) = s(at(v)) = as(t(v)). 

We also have 
a(s o t)(v) = as(t(v)). 

So (sot)(av) = a(sot)(v). Therefore the composite sot satisfies LT2. 


Since both LT1 and LT2 are satisfied, the composite s o t is a linear 
transformation. 


(b) Suppose that the vector spaces V, W and X have dimensions n, m 
and p, respectively. Then A is an m x n matrix of the form 


a11 412 Gin 

a21 Q22 a2n 
A= . 

Gm1 Am2 `` Amn 


and B is a p x m matrix of the form 


bit biz? + bim 

b21 b22 +++ bam 
B= : . . 

bi bs = bm 


3 Composition and invertibility 


We use Strategy C15 to find the matrix of the linear transformation 
sot with respect to the bases E and G. 


We find the images under s o t of the vectors e1,. 
basis E for V. 


To find the image of the basis vector e1, we use the n x 1 column 
matrix containing the coordinates of e; with respect to the basis EF. 
This matrix has 1 in the first row and 0 elsewhere. Using the matrix 
representations of t and s, we find that 


.., €n that form the 


1 Qil 42 t Ain 1 a11 
0 a21 Q2 ` Am 0 azı 
t —> = . 
0 E AmI Am2 *** Amn 0 E aml F 
and 
a11 bii biz + bim ai 
a21 b21 b22 +++ bam a1 
s =>]. : 
aml) p bp1 bp2 ee bpm am1/ p 
ER män 
b11a11 + bimamı 
CA rir 
b21411 + b2mam1 
bp @i1 ape Se bpmamı G 
So 


(s o t)(e1) = (b11a11 +: +: + bimaâmı, -- - , bp1011 + ++* + bpmamı)G- 


Similarly, we find that, for k = 2,...,n, 


(sot)(e,) = (b11aik ++: + bimamk, - - - ,bp101k +: + bpmamk)G- 


Next, we find the G-coordinates of each of the image vectors, but the 
image vectors are already in this form. 


We now construct the matrix of s o t, column by column. The first 
column contains the coordinates of (s o t)(e,), the second column 
contains the coordinates of (s o t)(e2), and so on. Thus the matrix of 
sot with respect to the bases E and G is 


by1a11 bimO@m1 e b11@1g+++++0imaGmk +++ b11d1nt+: ++ + bimaâmn 
bj1a11 bjmamı e bjiaikt: i +Hbjmaâmk +++ bjiäin t: +bjmamn 
bp1a11 bpmamı iii bpak oe -+bpmOmk ut bpi@in+: : + bpmOmn 


Using the rules for matrix multiplication, we find that the above 
matrix is the same as the matrix product BA. 


Thus BA is the matrix of s o t with respect to the bases E and G. E 


235 


Unit C3 Linear transformations 


Worked Exercise C57 


Use the Composition Rule to find the matrix representation of the linear 
transformation s ot with respect to the standard bases for the domain and 


codomain. 
t: R? — R? and s : R? — R? 
E (> G a) G) 
: o 1 3/\” a Š 


Exercise C97 


Use the Composition Rule to find the matrix representation of the linear 
transformation s ot with respect to the standard bases for the domain and 


codomain. 
t: RÉ — R? and s : R? — R3 
x 2 il 


_,fi 024 (J) [0 2 (3) 
2103 Y 1 0/ Y 


Exe 
Exes 


236 


We now return to two examples of linear transformations of vector spaces 
of polynomials: 


t: Pp — P 

p(x) > p(x) + p(2) 
and 

s: P — Py 

p(x) ++ p' (x). 

We compose these linear transformations as follows: 

(s o t)(p(a)) = s(t(p(x))) 
s(p(z) + p(2)) 
(p(x) + p(2))' 
= p'(z). 


Thus the composite is 


sot: Ps — Po 
p(x) — p (z). 


In this case, the functions sot and s are the same function. 


Exercise C98 


Use the Composition Rule to find the matrix representation of the linear 
transformation s o t with respect to the standard bases E = {1, x, x? } for 
P; and F = {1, x} for Po, when 
s: Pp — Py t: P — P 
an 
p(z) — p' (2) p(z) — p(z) + p(2). 


(In Worked Exercise C53(b) and Exercise C92(b) you found that s and t 
have the following matrix representations 


s: Ph — P 


a a 
0 1 0 b 
: (5 0 l ’ = (a), 
c 
E E 

and 

t: P — P 

a 224 a 2a+ 2b + 4c 

b| +> {0 1 0 = b 

c/p 0 0 1 c) p c # 


with respect to the standard bases E and F for P} and P2, respectively.) 


3 Composition and invertibility 


237 


Unit C3 Linear transformations 


238 


In Subsection 3.2 of Unit C1 we claimed that multiplication of matrices is 
associative. We now prove this result, by using the Composition Rule 
(Theorem C43). 


Corollary C44 


Let A, B and C be matrices of sizes q x p, p x m and mM x n, 
respectively. Then 


A (BC) = (AB)C. 


Proof Let t, s and r be the linear transformations whose matrix 
representations with respect to the standard bases for the domain and 
codomain are 
t: R” — R™®” s:R™” — R’ r: R? — R1 
and 
vr Cv, vr Bv vi Av. 
It follows from the Composition Rule that A(BC) is the matrix of the 
linear transformation ro (sot) and that (AB)C is the matrix of the linear 
transformation (r o s) ot, with respect to the standard bases for the 
domain and codomain. The linear transformations r o (sot) and (ros) ot 
are equal, since (r o (so t))(v) and ((r o s) ot)(v) both mean r(s(t(v))). It 
follows that A(BC) = (AB)C. a 


This result illustrates how we can prove results about matrices by using 
linear transformations. We can also prove results about linear 
transformations by using matrices, as we do in the next subsection. 


3.2 Invertible linear transformations 


In this subsection we introduce the notion of an invertible linear 
transformation. Suppose that t: V —> W is a linear transformation that 
is one-to-one (no two elements of V have the same image) and is also onto 
(the image set t(V) is the whole of W); that is, each element of W is the 
image of exactly one element of V. Then t has an inverse function t~! with 
domain W, such that 


tl(t(v)) =v, for each v €V, 
and 

t(t~'(w)) =w, for each w € W; 
that is, 


tlot=iy and tot | = iw. 


We say that t is invertible. This is illustrated in Figure 20. 


V W 


Figure 20 A linear transformation t and its inverse t~! 


Definition 
The linear transformation t : V —> W is invertible if there exists an 
inverse function t-! : W —> V such that 


tlot=iy and tot! = iw. 


Thus a linear transformation t : V —> W is invertible if and only if it is 
one-to-one and onto. 


The linear transformation 
t: R? — R? 
(x,y) > (x, 0) 
is not invertible, since it is not one-to-one; for example, 
£11) 201, 2) = (10): 
The linear transformation 
t: R? — R? 
(x,y) — (x,y, 0) 


is not invertible, since it is not onto: the image set t(R?) is the 
(x, y)-plane, which is not the whole of the codomain R3. 


Now consider the linear transformation 
t: R? — R? 
(x,y) — (2x, 2y). 


We can check that t is one-to-one and onto and hence invertible by using 
the methods of Unit A1, but what is the inverse function of t? 


3 Composition and invertibility 


239 


Unit C3 Linear transformations 


t 
—— 
= 2 
F 


4—1 


<—_ 


AT! 


6 
$ 


Figure 21 The linear 
transformation t with 
matrix A, and its inverse 


240 


Since t stretches each vector by a factor 2, we expect the inverse function 
of t to be the linear transformation 


s : R? — R? 
(x,y) — (a, $Y) , 
which contracts each vector to half its magnitude. Since 
s(t(x,y)) = s(2x, 2y) = (x,y) 
and 
t(s(z, y)) = t(Z2, dy) = (2y) 


for each vector (x,y) in R?, sot and to s are both the identity 
transformation of R?, so s is the inverse function of t. 


Exercise C99 


Verify that the linear transformation 
s : R? — R? 
(x,y) — (z + y, 3a + 4y) 
is the inverse function of the linear transformation 
t:R? — R? 
(x,y) — (4z — y, —3x + y). 


In fact, the inverse of any linear transformation is a linear transformation. 
Unfortunately, it is not always obvious whether a given linear 
transformation t: V —> W is invertible. Even if we know that t is 
one-to-one and onto and hence invertible, it may not be clear what the 
inverse function of t is. If V and W are both finite-dimensional vector 
spaces, however, then t has a matrix representation. The next theorem, 
illustrated in Figure 21, shows that this can be used to determine whether 
t is invertible and, if so, to find the inverse function of t. If you are short of 
time, you should just look at the structure of this proof and come back to 
it when time permits. 


Theorem C45 Inverse Rule 
Let t: V — W be a linear transformation. 


(a) Tft is invertible, then t~' : W — V is also a linear 
transformation. 


(b) If A is the matrix of t with respect to the bases E and F, then: 


(i) tis invertible if and only if A is invertible 


(ii) ift is invertible, then A~! is the matrix of t~! with respect 
to the bases F' and E. 


3 Composition and invertibility 


Proof Let t:V— W bea linear transformation. 


(a) 


Suppose that t is invertible. 


®. We use Strategy C14 to show that the inverse function 

t-!: W — V is a linear transformation. 

We first show that t~! satisfies LT1: for all W1,W2 E€ W we have 
tw + w2) = t~1(wi) + t-1(we). hd 


Let w1, w2 E€ W. Then, since t is invertible and hence onto, there 
exist v1, V2 € V such that w1 = t(vi) and we = t(v2). Since t satisfies 
LT1 we have 


tlw + we) = t+ (t(vi) + t(vo)) 
= ica + v2)) 


= vı + V2. 
Also, 
ti (w1) +t (w2) = t (t(v1)) + t+ (t(v2)) 
= Vı + V2. 


So tt(w1 + w2) = t1 (w1) + tt (w2). Therefore t~! satisfies LT1. 


@. Next we show that t~} satisfies LT2: for all w € W, a € R we 
have ti (aw) =at-!(w). ® 


Let w € W; then there exists v € V such that w = t(v). Let a € R; 
then, since t satisfies LT2 we have 


t (aw) =f at(v)) =f" Cav)) = av. 
Also, 
at-'(w) =at !(t(v)) = av. 
So t-!(aw) = att (w) and LT2 is satisfied. 
Since both LT1 and LT2 are satisfied, t~! is a linear transformation. 
Let A be the matrix of t with respect to the bases E and F, so 
t: vet > Avge =wr, for any vector v € V. 


@. We prove the ‘if’ statement and show that if A is invertible, then t 
is invertible. Using properties of matrices, if A is invertible, then 
AA-!=I=A~!A, where I is the identity matrix. @ 

We show that if A is invertible, then t is invertible. Suppose that A is 
invertible. Then we know that A is a square matrix and A~! is also 
square (and of the same size); so we can define s to be the linear 
transformation with the matrix representation 


s: W — V 
wp — A` 'wp = s(w)p. 


We show that s is the inverse function of t, and hence that t is 
invertible. 


241 


Unit C3 Linear transformations 


It follows from the Composition Rule that sot has the matrix 
representation 


sot:V—>V 
VERS (AtA)vz = Ivg = vg. 
Thus s(t(v)) = v for each v € V; that is, sot = iy. 


Similarly, it follows from the Composition Rule that tos has the 
matrix representation 


tos: W — W 
wF — (AA`})wr = Iwp = wp. 
Thus t(s(w)) = w for each w € W; that is, to s = iw. 
Since sot = iy and to s = iw, it follows that s is the inverse function 
of t, so t is invertible. 


®. We prove the ‘only if’ statement and show that if t is invertible, 
then A is invertible and AT! is the matrix of t~! with respect to the 
bases F and E. ® 


We show that if t is invertible, then A is invertible. Suppose that t is 
invertible so t™t is a linear transformation. Then by Theorem C40 it 
has a matrix representation 


ti *:wovV 
wrt > Bwr = t-l(w)p. 
We show that B = A7!. 


It follows from the Composition Rule that tt o t has the matrix 
representation 


tlot:V—4V 
vet > (BA)ve. 
Since (t~1 o t)(v) = v for each v € V, it follows that 
(BA)vz=ve, forallveV. 
Thus BA =I. 


Similarly, it follows from the Composition Rule that tott has the 
matrix representation 


tott: W —W 
wrt (AB)wr. 
Since (t o t~!)(w) = w for each w € W, it follows that 
(AB)wr=wr, forall w e€ W. 


Thus AB = I. 
Since 
BA = AB =I, 


it follows that A is invertible and B = A~!. Therefore A~! is the 
matrix of t~! with respect to the bases F and E. 


This completes the proof. E 


242 


3 Composition and invertibility 


One consequence of the Inverse Rule is that if t: V — W is an invertible 
linear transformation, then any matrix of t must be invertible and hence 
square. Since a matrix of t has m rows and n columns, where m is the 
dimension of W and n is the dimension of V, it follows that m = n; that 
is, the vector spaces V and W must have the same dimension, and we have 
the following corollary to Theorem C45. 


Corollary C46 


Let t: V — W be an invertible linear transformation, where V 
and W are finite-dimensional. Then 


dim V = dim W. 


It follows that if t: V —> W is a linear transformation and V and W have 
different finite dimensions, then t is not invertible. For example, the linear 
transformation 


t: R? — R? 
(zyz) > (2x ! Y, £ y) 


is not invertible, since the domain and codomain have different dimensions. 


Now suppose that t : V —> W is a linear transformation and that V 

and W have the same finite dimension. It follows from the Inverse Rule 
that you can use the following strategy to determine whether or not t is 
invertible. Recall that you saw in Subsection 5.4 of Unit C1 that a matrix 
is invertible if and only if its determinant is non-zero. 


Strategy C16 


To determine whether or not a linear transformation t : V —> W is 
invertible, where V and W are n-dimensional vector spaces with bases 
E and F, respectively, do the following. 


1. Find a matrix representation of t, 
vg > Avg =t(v)r. 

2. Evaluate det A. 

e If det A = 0, then t is not invertible. 


e If det A #0, then t is invertible and t=! : W —» V has the matrix 
representation 


wrp A Iwp = tl(w)p. 


243 


Unit C3 Linear transformations 


Worked Exercise C58 


Show that the following linear transformation t is invertible and find the 
inverse function of t. 


244 


t: R? — R? 
(x,y) — (x +y, 2y) 


Solution 
We use Strategy C16 and first find a matrix representation of t. 
®@. We find a matrix representation of t using Strategy C15. & 
We have 

ALOS and O = 2) 
®. Since we have the standard basis in the codomain, the 
F-coordinates of the image vectors are immediate. © 


Hence the matrix representation of t with respect to the standard 
bases for the domain and codomain is 


a 


The next step is to evaluate the determinant of the matrix 


r 
A= (; Al 
We have 


i 1 


det A = 0 2 


[=1x2-1x0=2 


Since det A is non-zero, t is invertible. 


We now find the inverse function of t. According to Strategy C16, 
t~! : R? — R? has the matrix representation v > A7'v, with 
respect to the standard bases for the domain and codomain. Since 


ul 
-1_1 D = a 1 7 2 
an@ Yel) 4) 


it follows that t~' has the matrix representation 


(NOCH 
1 = 1 : 

y 0 3 y ay 

So t~! is the linear transformation 


tok 


3 Composition and invertibility 


Exercise C100 


Determine which of the following linear transformations are invertible. 
Find the inverse function of each invertible linear transformation. 


(a) t: R? — R? (b) t: R? — R? 

(x,y) — (2x + y, 4x + 2y) (x,y) > (x — y,3£ +y) 
(c) t: R? — R (d) t: P — P> 

(£, y, z) +> (2x, 3y — x, z) p(x) > p' (x) 


In Worked Exercise C58 we considered the linear transformation 

t: R? — R? 

(x,y) — (z£ + y, 2y). 
We found the matrix A of t with respect to the standard basis for R?, and 
showed that det A = 2. In fact, whatever bases we had chosen for the 
domain and codomain, we would still have obtained a matrix of t with 
determinant equal to 2. 
It can be shown that the magnitude of the determinant of a matrix of t is 


the ‘scaling factor’ of t. Since det A = 2 in the above case, areas are 
doubled under t, as shown in Figure 22. 


YA 
(1,2) (2,2) 


(1,0) 7 (0,0) (1,0) t 
Figure 22 A linear transformation with ‘scaling factor’ 2 


This ‘scaling factor’ explains the geometric interpretation of the 
determinant of a 2 x 2 matrix: that for two position vectors (a,c) and 
(b, d), the determinant 

a b 
& d 


gives the area of the parallelogram with adjacent sides given by these 
position vectors. The matrix 


Ea) 


is the matrix of the linear transformation with respect to the standard 
basis for R? that maps these basis vectors to (a,c) and (b, d), respectively. 


245 


Unit C3 Linear transformations 


246 


For a linear transformation t : R? — R? and a matrix A of t with 
det A = 0, the image of a unit square under t is a line or a point — these 
have zero area. So, in this case, t is not invertible. 


3.3 Isomorphisms 


You have seen that there are invertible linear transformations from R? to 
itself, and from R to itself. In fact, whenever the vector spaces V and W 
have the same finite dimension, we can construct an invertible linear 
transformation from V to W. 


For example, consider the two-dimensional vector spaces R? and P>. The 
linear transformation 
t: Pa — R? 
a + bz — (a,b) 
is one-to-one and onto and hence invertible. By looking at a matrix 
representation of t in this example, we can see how to construct a general 


invertible linear transformation from V to W, whenever V and W have the 
same finite dimension. 


For t above, take the standard bases FE = {1, x} for P and 
F = {(1,0), (0,1)} for R?. Then (1) = (1,0) and t(x) = (0,1), so t has the 


matrix representation 


Ge 1) =) 
bje 0 1) \b/, bjp 
that is, 

veto bvg = WF. 
More generally, let V and W be n-dimensional vector spaces, let Æ be a 
basis for V and let F be a basis for W. Then 

t: V >W 

vg > l VE = WF 

is a linear transformation from V to W. Since the identity matrix Iņ is 
invertible, it follows from the Inverse Rule that t is invertible. Note that t 
maps the first basis vector in E to the first basis vector in F', the second 


basis vector in E to the second basis vector in F, and so on. We say that t 
is an isomorphism from V to W. 


Definition 
The vector spaces V and W are isomorphic if there exists an 


invertible linear transformation t : V —> W. Such a function t is an 
isomorphism. 


3 Composition and invertibility 


You met isomorphisms between groups in Unit B2 Subgroups and 
isomorphisms. Isomorphisms between vector spaces are analogous: they 
identify when vector spaces are ‘structurally identical’ to each other. 


Exercise C101 


Write down an isomorphism from P; to R°. 


Although the examples of isomorphisms given above involve the identity 
matrix, any invertible linear transformation provides an isomorphism; so 
any invertible matrix is possible. For example, consider the following 
matrix A of a linear transformation s : P — R° with respect to the 
standard bases in the domain and codomain. 


0 0 2 
A= {1 1 2 
1 2 1 


This matrix is invertible; you might like to check that it has determinant 2. 


It is likely that this linear transformation provides a different isomorphism 
between the vector spaces P3 and R? to the one you wrote down as your 
answer to Exercise C101. 


Suppose that V and W are finite-dimensional vector spaces. We have just 
seen that if dim V = dim W, then there is an invertible linear 
transformation t : V —» W; that is, V and W are isomorphic. 


In particular, each n-dimensional vector space is isomorphic to R”. 


We also know that if V and W have different dimensions, then there are 
no invertible linear transformations from V to W; that is, V and W are 
not isomorphic. Thus we have proved the following result. 


Theorem C47 


The finite-dimensional vector spaces V and W are isomorphic if and 
only if 


dim V = dim W. 


Exercise C102 


State which of the following vector spaces are isomorphic to each other: 


R?, R3, C, Pp, P3: 


247 


Unit C3 Linear transformations 


A (x,y, 0) 


Figure 23 A projection 
function in R3 


V W 


Figure 24 The image set of a 
linear transformation 


248 


4 Image and kernel 


In the previous section you saw that a linear transformation t: V —> W is 
invertible if t is one-to-one and onto; that is, each element of W is the 
image of exactly one element of V. 


In this section you will meet a strategy for determining which elements 

of W are the images of elements of V, before investigating conditions under 
which an element of W is the image of more than one element of V. This 
enables us to prove an important result known as the Dimension Theorem. 


Finally, we use the Dimension Theorem to show how the number of 
possible solutions of a system of m linear equations in n unknowns 
depends on the values of m and n. 


4.1 Image of a linear transformation 


Let t be the linear transformation 
t: R? — R? 
(x,y,z) > (x,y, 0). 
This projects each vector (x,y,z) onto the vector (x,y, 0) in the 
(x, y)-plane of R? as shown in Figure 23. 


A vector w in the codomain R? is the image of a vector v in the domain R? 
if and only if w is in the (x, y)-plane. We say that the (x, y)-plane is the 
image set of t. This is a two-dimensional subspace of the codomain R°. 


Recall from Unit A1 that the image set of a function is the set of all 
elements of the codomain that are images of some element in the domain. 
Thus the image set of a linear transformation t : V —> W is the set of all 
vectors of W that are images of vectors of V as shown in Figure 24. 


The image set of a linear transformation is sometimes simply called its 
image. 


Definition 
The image set of a linear transformation t : V —> W is the set 


Imt = {t(v): v eV}. 


Note that the meaning of Imt, which here is the image set of t, is different 
from that of Im used in Unit A2 Number systems where Im meant the 
imaginary part of a complex number. 


It is important to remember that the image set of t is a subset of W, but it 
need not be equal to W because there may be some vectors of W that are 
not images of vectors in V. Also, some vectors of W may be images 

under t of more than one vector of V. Another, equivalent, way of 
expressing this image set is 


Imt={weW:w=t(v), for some v € V}. 


Exercise C103 


Give a geometric description of the image set of each of the following linear 

transformations. In each case, state whether the image set is a subspace of 

the codomain. 

(a) t:R — R? 
(x,y, 2) + (x,0) 


(b) ¢:R? — R? 
(x,y) —> (a, x) 


For each of the linear transformations in Exercise C103, the image set is a 
subspace of the codomain. This is true for all linear transformations. 


Theorem C48 


Let t: V — W be a linear transformation. Then Imt is a subspace 
of the codomain W. 


Proof We follow Strategy C10 in Unit C2. 

We check first that 0 € Imt. 

Since t is a linear transformation, t(0) = 0, so 0 € Imt. 
®. This is illustrated in Figure 25. @ 

We check next that Imt is closed under vector addition. 


Let w1, w2 € Imt. Then there exist vectors v1, v2 € V such that 
w1 =t(v1) and w2 = t(v2). Since t is a linear transformation, 


wi + we = t(v1) + t(v2) = t(vı + v2). 
Since V is closed under vector addition, vı + v2 € V, so w1 + Wo € Imt. 
®. This is illustrated in Figure 26. ©& 
Finally, we show that Imt is closed under scalar multiplication. 
Let w € Imt and a € R. Then there exists v € V such that w = t(v) and, 
since t is a linear transformation, 
aw = at(v) =t(av). 
Since V is closed under scalar multiplication, av € V, so aw €E Imt. 
®. This is illustrated in Figure 27. @ 
Thus Imt is a subspace of W. Oo 


For the linear transformations studied so far, it has been easy to write 
down their image sets. In general, this is not the case; so we need some 
way of determining the image set of a linear transformation. 


If we know the image of each vector in a basis for V, then we can find the 
image of each vector in V since linear combinations of vectors are 
preserved (Theorem C39). Thus the image set of t is determined by the 
images of the domain basis vectors. 


4 Image and kernel 


V W 


Figure 25 The image of the 
zero vector 


Figure 26 The image of a 
sum of vectors 


Figure 27 The image of a 
scalar multiple 


249 


Unit C3 Linear transformations 


Figure 28 The images of the 
domain basis vectors in the 
image set 


250 


For example, consider the linear transformation 
t: R? — R? 
(x,y,z) —> (x,y, 0). 
The standard basis for the domain R? is {(1, 0,0), (0, 1,0), (0,0, 1)}. 


The images of the vectors in this basis are 
(1,0,0), (0,1,0), (0,0,0). 


These vectors all lie in Imt, which is the (x, y)-plane, and they span Imt; 
that is, each vector in Imt can be written as a linear combination of the 
vectors (1,0,0), (0,1,0) and (0,0,0). 


Exercise C104 


Let t be the linear transformation 
t: R? — R? 
(x,y) +> (2,2). 


Determine the images of the vectors in the standard basis {(1,0), (0, 1)} for 
the domain R?. Do these image vectors span Im t? 


(In Exercise C103(b) you found that the image set of this linear 
transformation is the line y = z.) 


We now show that, for any linear transformation, the images of the 
domain basis vectors span the image set; the images of these domain basis 
vectors are illustrated in Figure 28. 


Let t: V — W be a linear transformation and let {e1,...,en} be a basis 
for V. If w € Imt, then w = t(v) for some v in V. Since {e1,...,en} isa 
basis for V, there exist real numbers v1,..., Un, such that 


V=vyey H- + Unen. 


Since t is a linear transformation, it preserves linear combinations of 
vectors (Theorem C39), so it follows that 


w = t(v) = vit(e1) +--- + upt(en). 
Thus w is a linear combination of the vectors t(e1),...,¢(€n). 
So {t(e1),...,¢(e,)} is a spanning set for Imt, as claimed. 


Since a basis is a linearly independent spanning set, we now give a strategy 
that enables us to find a basis for the image set of a linear transformation. 


Strategy C17 

To find a basis for Imt, where t : V —> W is a linear transformation, 
do the following. 

1. Find a basis {e1,...,e,} for the domain V. 

2. Determine the vectors t(e1), ..., tlen). 


3. If there is a vector v in S = {t(e1),...,¢(e,)} that is a linear 
combination of the other vectors in S, then discard v to give the 
set 5S; = S — {v}. 

4. If there is a vector vı in S4 such that vı is a linear combination of 


the other vectors in S1, then discard vı to give the set 
So = Sy T {vi}. 


Continue discarding vectors in this way until you obtain a linearly 
independent set. This set is a basis for Imt. 


Once we know a basis for the image set of a linear transformation, we 
know everything that we need to know about the image set; in particular, 
we know its dimension. 


Worked Exercise C59 
Let t be the linear transformation 
t: R? — R? 
(x,y, 2) (z£ + 2y + 3z, 4z + y — 2z). 


Find a basis for Im t and state the dimension of Im t. 


4 Image and kernel 


251 


Unit C3 Linear transformations 


252 


Exercise C105 


For each of the following linear transformations t, find a basis for Imt and 
state the dimension of Imt. 
(a) t:R? — R? (b) t: P3 — Py 
(x,y) —> (a, 20 + y) p(x) + p' (2) 
(c) t: R? — R3 
(x,y,z) — (z + 2y + 3z,£ + 2z,£ +y + 2z) 


For the linear transformation t in Exercise C105(c), Imt is a 
two-dimensional subspace of the codomain R°. Thus Imt is a plane 
through the origin with equation 


an + by + cz = 0, 


for some a,b,c € R not all zero. It is possible to use the basis that you 
found for Imt in Exercise C105(c) to work out the values of a, b and c. For 
example, using the basis {(1,1,1), (2,0,1)} for Imt, we can proceed as 
follows. Since the basis vectors belong to Imt, the values a, b and c satisfy 
the system 


a+b+c=0 
2a +c=0. 


The second equation gives c = —2a. Substituting this into the first 
equation gives b = a. So Imt is the plane with equation 

ax + ay — 2az = 0 
or, equivalently, 

r+y-—2z=0. 


Finally, we note that a linear transformation t : V —> W is onto when 
every element of W is the image of an element of V; that is, a linear 
transformation is onto if and only if Imt = W. Since Imt is a subspace 
of W, if dim(Imt) = dim W then we can immediately conclude that 

Imt = W and, conversely, if Imt = W then dim(Imt) = dimW. Thus we 
have the following result. 


Proposition C49 


A linear transformation t is onto if and only if dim(Imt) = dimW. 


Exercise C106 


Which of the following linear transformations are onto? 
(a) t:R? — R? (b) t: PB — Pp 

(x,y) — (x, 2x + y) p(x) + p' (x) 
(c) t: R? — R3 

(x,y,z) — (z + 2y + 3z,£ + 2z,£ +y + 22) 


(These are the linear transformations from Exercise C105.) 


4.2 Kernel of a linear transformation 


You have seen how to find the image set of a linear transformation 
t: V —> W. Now suppose that w belongs to the image set of t. How can 
we find all the vectors in V that map to w? 


We begin by looking at the case when w is the zero vector. We know that 
t(0) = 0, but it is possible that there are also some non-zero vectors in V 
that are mapped to 0. 


For example, let t be the linear transformation 
t: R? — R? 
(x,y,z) > (x,y, 0). 


Then t(x,y,z) = 0 if and only if (x, y,0) = (0,0,0); that is, if and only if 
x=Oandy=0. 


Thus the set of vectors that are mapped to 0 is the whole of the z-axis. 
This set is a one-dimensional subspace of the domain R?. We call this set 
the kernel of t. 


The first use of kernel in the context of algebra was by the Russian 
mathematician Lev Semyonovich Pontryagin (1908-1988) in 1931. 
Pontryagin, who lost his eyesight in an accident when he was fourteen, 
was one of the leading Russian mathematicians of the 

twentieth century. He made fundamental contributions to algebra, 
topology, and dynamical systems. 


Pontryagin’s choice of the term kernel appears unrelated to its use in 
other areas of mathematics (integral equations, Fourier analysis). 


4 Image and kernel 


Lev Semyonovich Pontryagin 


253 


Unit C3 Linear transformations 


Definition 
The kernel of a linear transformation t : V —> W is the set 


Keri = (we V -i(v)— 01. 


t 
Ss 
tie Figure 29 illustrates that the kernel is the set of vectors of V mapping to 
> the zero vector of W. 


Exercise C107 


Give a geometric description of the kernel of each of the following linear 
transformations. In each case, state whether the kernel is a subspace of the 
domain. 
(a) +¢:R?—>R? (b) t: R? — R? 

(x,y, 2) + (x,0) (x,y) — (z, x) 


t 
— 
For each of the linear transformations in Exercise C107, the kernel is a 
subspace of the domain. This is true for all linear transformations. 
V W Theorem C50 


Let t: V — W hbe a linear transformation. Then Kert is a subspace 
of the domain V. 


Figure 29 The kernel maps 
to the zero vector 


Figure 30 The vector 0 is in 


the kernel 
=m Proof We use Strategy C10 in Unit C2. 
= First we show that 0 € Kert. 
M 
la Since ¢ is a linear transformation, t(0) = 0, so 0 € Kert. 
®@. This is illustrated in Figure 30. & 
V W Next we show that Kert is closed under vector addition. 
Figure 31 The kernel is Let vj, v2 E€ Kert. Since t is a linear transformation, 
closed under vector addition t(vi + v2) = t(v1) +t(ve) =0+0=0, 
t so vı + v2 E€ Kert, as required. 
— 
®. This is illustrated in Figure 31. @ 
pe Finally, we show that Kert is closed under scalar multiplication. 
Let v € Kert and a € R. Since t is a linear transformation, 
tiav) =at(v) =a0=0, 
7 = (av) = at(v) 
Figure 32 The kernel is soav Reri 
closed under scalar ®. This is illustrated in Figure 32. &@ 
multiplication Thus Kert is a subspace of V. E 
254 


4 Image and kernel 


When finding the kernel of a linear transformation, we often need to solve 
a system of linear equations; this sometimes involves using Gauss-Jordan 
elimination as in Unit C1. 


Worked Exercise C60 


Find the kernel and the dimension of the kernel of the linear 
transformation 


t: R? — R? 
(x,y,z) — (z + 2y + 3z, 4a + y — 22). 


Solution 

The kernel is the set of vectors (a, y, z) in RÌ that satisfy 
Haia = 

that is, 
(x + 2y + 3z,4x + y — 2z) = (0,0). 

Equating coordinates, we obtain the system 


@ <P 2Y sp oz =O 
Ags o = 22 =. 


To solve this system we row-reduce the augmented matrix. 
rı HER, 3) 0\ 6 
r2 Al BOY 


i 3 | 0 6 
© =y =O 


rg > 19 — Ar, 


1 2 e || O 
Teo) — tro 

1 0 -1/0 

0 il || 0) 


The augmented matrix is in row-reduced form and we have 


© 
pi 
N 
= 


rı > rı — 2ro 


2 = g=U 
y+ 2z=0. 
Assigning the parameter k to the unknown z, we obtain 
eek V= =D Z= 
So the kernel of t is 
Kert = {(k,—2k, k) : k € R}; 
that is, Kert is the line through (0,0,0) and (1, —2, 1). 


Thus Kert is a one-dimensional subspace of the domain R°. 


255 


Unit C3 Linear transformations 


Exercise C108 


For each of the following linear transformations t, find the kernel of t and 
the dimension of the kernel. 


(a) t:R? > R? 


(x,y) —> (x, 22 +y) 


(b) t: R? — R? 


(x,y,z) — (z + 2y + 3z,£ + z,£ +y +22) 


We now look at examples involving vector spaces of polynomials. 


Worked Exercise C61 


Find the kernel of the linear transformation 


256 


t: Ps — Ps 
p(x) —> p(x) + p(2). 


Solution 
Let p(x) = a+ bz + cx” be a polynomial in P3. Then 
t(p(z)) =a+ br + cx? +a +2b+ 4c 
= 2a + 2b + 4c + ba + cx”. 


The kernel of t is the set of polynomials in P that satisfy t(p(a)) = 0; 
that is, 


2a +2b+4c+br+cr?=0, forallzeR. 


Equating coefficients, we obtain the system 


2a + 2b + 4c =0 
b = 
e= (0. 


Substituting b = 0 and c = 0 into the first equation gives a = 0. So 
the only solution is a= 0, b= 0 and c=0. 


Thus the only polynomial in the kernel of t is the zero polynomial 
pi) = 0; that 1s; 


Kert = {0}. 


®. The kernel comprises just the zero vector so it has dimension 0. ® 


Exercise C109 


Find the kernel and dimension of the kernel of the linear transformation 
t: Ps — P, 
p(x) > p'(2). 


For a given linear transformation t : V —> W, we know how to find all the 
vectors in V that map to 0 in W. Now suppose that b (4 0) is some 
particular vector in W. How do we find all the vectors in V that map 

to b? This is illustrated in Figure 33. There is a close relationship between 
the vectors that map to b and those that map to 0: if we know one vector 
a in V that maps to b, that is, t(a) = b, then every vector x in V that 
maps to b may be written in the form x = a + k, where t(k) = 0, that is, 
k € Kert. We state this powerful result formally in the following theorem; 
the proof is short and constructive. 


Theorem C51 Solution Set Theorem 


Let t: V — W be a linear transformation. Let b € W and let a be 
one vector in V that maps to b, that is, t(a) = b. Then the solution 
set of the equation t(x) = b is 


{x:x=a-+k for some k € Kert}. 


Proof ®. The proof is in two parts. We first show that the given set is a 
subset of the solution set. & 


First we show that each vector x of the given form is a solution of 
t(x) = b. Let x = a + k, where k € Kert. Then 


t(x) = t(a +k) = t(a) + t(k) = b +0 = b. 
®. We now show that the solution set is a subset of the given set. & 


Conversely, we show that each vector x in the solution set has the given 
form. Let t(x) = b, where x € V. Then 


t(x — a) =t(x) —t(a) =b—b=0, 


so x —a€ Kert; that is, x = a + k, for some k € Kert. | 


Finally, we recall that a linear transformation t : V —> W is one-to-one if 
and only if no two elements in V have the same image. Thus we have the 
following result. 


Proposition C52 


A linear transformation t is one-to-one if and only if Kert = {0}. 


4 Image and kernel 


Figure 33 The vectors of V 


mapping to b 


257 


Unit C3 Linear transformations 


Exercise C110 


Which of the following linear transformations are one-to-one? 
(a) t:R? — R? 
(x,y) > (x, 2x + y) 
(b) t: R? — R? 
(x,y,z) — (z + 2y + 32,2+2,2+y + 22) 
(c) t: P. — P 
p(x) — p'(x) 
(You found the kernels of these in Exercises C108 and C109.) 


4.3 Dimension Theorem 


You have seen that a linear transformation t : V —> W has two particular 
subspaces associated with it: Kert in the domain V and Imt in the 
codomain W, as show in Figure 34. 


Figure 34 The subspaces Kert and Imt 
There is a remarkable connection between the dimensions of these two 
subspaces and the dimension of the domain V. 
Let t be the linear transformation 
t: R? — R? 
(x,y,z) —> (x,y, 0). 
You have seen that for this linear transformation: 
e the image set of t is the (x, y)-plane, so dim(Imt) = 2 
e the kernel of t is the z-axis, so dim(Ker t) = 1. 
Thus 
dim(Im t) + dim(Ker t) = 2+1 = 3, 
which is the dimension of the domain R°. 
Now let t be the linear transformation 
t: R? — R? 
(x,y,z) — (z + 2y + 3z, 4a + y — 2z). 


258 


You have seen that for this linear transformation: 
e the image set of t is the whole of R?, so dim(Imt) = 2 


e the kernel of t is the line through (0,0,0) and (1, —2,1), so 
dim(Kert) = 1. 


Thus 
dim(Im t) + dim(Kert) = 2+ 1 = 3, 


which is the dimension of the domain R3. 


Exercise C111 


For each of the following linear transformations t, calculate 
dim(Im t) + dim(Ker t) 
and compare your answer with the dimension of the domain of t. 
(a) t: R? — R? (b) t: P — P 
(x,y) —> (zx, 2a + y) p(x) + p(x) 
(c) t: R? — R? 
(x,y,z) — (z + 2y + 3z,£ + 2z,£ +y + 22) 


(You found the bases and dimensions of the image sets in Exercise C105, 
and the kernels and dimensions of the kernels in Exercises C108 
and C109.) 


For each of the linear transformations in Exercise C111, the dimension of 
the image set plus the dimension of the kernel is equal to the dimension of 
the domain. This relationship holds for all linear transformations. We 
state this result in the next theorem; if you are short of time you should 
skim through this proof and come back to it when time permits. 


Theorem C53 Dimension Theorem 


Let t: V — W be a linear transformation. Then 


dim(Im t) + dim(Ker t) = dim V. 


Proof Let dim V = n and dim(Kert) = k. 
®. We show that dim(Imt) =n — k. @ 


Let {e1,..., e€} be a basis for Kert. We can extend this basis, by 
Theorem C26 in Unit C2, to give a basis {e1,...,e,} for V. We prove that 


F = {t(en41),-.-,t(en)} 


is a basis for Imt, which shows that dim(Imt) = n — k. 


4 


Image and kernel 


259 


Unit C3 Linear transformations 


260 


®. A diagram helps here: see Figure 35. ® 


V WwW for Imt 
Figure 35 Bases for Kert in V and Imt in W 


To show that F is a basis for Imt, we use Strategy C8 in Unit C2. 


First we prove that F spans Imt. We know from Subsection 4.1 that 
{t(e1),...,t(e€n)} spans Imt. Since e1,...,e€ẹ belong to Ker t, we know that 


t(e1) = t(e2) =--- = t(ex) = 0, 


so the span of {t(e1),...,t(e€n)} is equal to the span of 
{t(ek+1),--- t(€n)}. Thus F spans Imt. 


Next we show that F is a linearly independent set. We must show that if 
aptit(ek+1) + ak+atlekt2) +: + ant(en) = 0, 


then 


Ok+1 = Qk+2 = = An = 0. 
Since t is a linear transformation, we have 
aAkyitlek+1) +++: + ant(en) = t(ak+1€k+1 +++: + anen). 
So if 
ak+itlek+1) +--+ + anten) = 0, 
then 
t(ak+1€k+1 +++: + anen) = 0. 
Thus 
Qk+1€k+1 +++ + Anen E Kert. 


Since {e;,...,e,} is a basis for Kert, there exist real numbers @1,..., Qk 
such that 


QAk+1€k+1 +++ + Anen = ae] + +++ + AKek, 
SO 


Qe +++ + Okek — Ak+1Ek+1 — ++ Onen = O. 


Since {e),...,€,} is a basis for V and so is linearly independent, it follows 
that 
Oy = = Qk = —Agyy = = An = O. 
Thus 
Ak+1 = Qk+2 = 11° = An = 0, 
as required. 
Thus F is a basis for Imt, so dim(Imt) + dim(Kert) = dim V. a 


The Dimension Theorem is an important result and has several 
applications. For example, using the Dimension Theorem we can obtain 
information on whether a linear transformation t : V —> W is one-to-one 
and/or onto. 


Propositions C49 and C52 state that: 
e tis onto if and only if dim(Imt) = dim W 
e t¢ is one-to-one if and only if Kert = {0}. 


Suppose that t : V —> W is a linear transformation from the 
n-dimensional vector space V to the m-dimensional vector space W, as 
illustrated in Figure 36. 


We consider the three cases: n > m,n <mandn=™m. 


Case (a): n >m 


Since the image set of t is a subspace of W, we have dim(Imt) < m. It 
follows from the Dimension Theorem that 


dim(Kert) = dim V — dim(Imt) > n-m > 0. 
Thus Kert 4 {0}, so t is not one-to-one, as illustrated in Figure 37. 
For example, the linear transformation 
t: R? — R? 
(x,y,z) — (2z +y,£z + z) 


is not one-to-one, since the dimension of the codomain (which is 2) is less 
than the dimension of the domain (which is 3). 


This linear transformation is onto because dim(Imt) = 2 = dim R?. 
However, in general, a linear transformation with n > m may or may not 
be onto. 


4 Image and kernel 


dimV =n dim W = m 
t 
— 
V W 
Figure 36 A linear 


transformation from V to W 


V m> m W 
Ker t # {0} 
t is not one-to-one 


Figure 37 The case 
dim V > dim W 


261 


Unit C3 Linear transformations 


dimV =n i dim W = m 


n<m 
Imt~W 
t is not onto 


V 


W 


Figure 38 The case 
dim V < dim W 


n=m 
Ker t = {0}, Im t = W 
t is both one-to-one 
and onto 


Figure 39 The case 
dim V = dim W and 
Kert = {0} 


W 


n=m 
Ker t 4 {0}, Im t £ W 
t is neither one-to-one 
nor onto 


Figure 40 The case 
dim V = dim W and 
Kert 4 {0} 


262 


Case (b): n < m 
By the Dimension Theorem, 
dim(Im t) = dim V — dim(Kert) < n < m. 
Thus Imt is not the whole of the m-dimensional vector space W, so t is 
not onto, as illustrated in Figure 38. 
For example, the linear transformation 
t: R? — R’ 
(x,y) — (2z,£ +y,y) 


is not onto, since the dimension of the codomain (which is 3) is greater 
than the dimension of the domain (which is 2). 


This linear transformation is one-to-one because dim(Im t) = 2 = dim R?. 
However, in general, a linear transformation with n < m may or may not 
be one-to-one. 
Case (c): n = m 
By the Dimension Theorem, 
dim(Im t) + dim(Kert) = dim V = n = m. 
There are two possibilities: 
e dim(Kert)=0 and dim(Imt)=n=m 
e dim(Kert)>0 and dim(Imt) < m. 
If dim(Ker t) = 0 and dim(Imt) = n = m, then 
Kert = {0} and Imt= W. 
Thus ¢ is both one-to-one and onto, as illustrated in Figure 39. 
For example, consider the linear transformation from Exercise C105(a), 
t: R? — R? 
(x,y) > (z,2x +y). 
Here the domain and codomain both have dimension 2, dim(Ker t) = 0 and 


dim(Imt) = 2. The latter is equal to the dimension of the codomain, so t is 
both one-to-one and onto. 


If, on the other hand, dim(Kert) > 0 and dim(Imt) < m, then 
Kert # {0} and Imtis not the whole of W. 
Thus ¢ is neither one-to-one nor onto, as illustrated in Figure 40. 
For example, consider the linear transformation from Exercise C105(c), 
t: R? — R? 
(x,y,z) — (z + 2y + 3z,£ + 2z,£ +y +22). 


Here the domain and codomain both have dimension 3, dim(Ker t) = 1 and 
dim(Imt) = 2. The latter is less than the dimension of the codomain of t, 
and thus ¢ is neither one-to-one nor onto. 


We summarise these findings in the following theorem. 


Theorem C54 


Let t: V — W be a linear transformation from an n-dimensional 
vector space V to an m-dimensional vector space W. 


(a) If n >m, then t is not one-to-one: Kert # {0}. 
(b) Ifn <m, then t is not onto: Imt 4 W. 
(c) Ifn=m, then 
e either tis both one-to-one and onto: 
Kert = {0} and Imt=W 
e or tis neither one-to-one nor onto: 
Kert # {0} and Imt 4 W. 


Exercise C112 


What can we deduce from Theorem C54 about the following linear 
transformations? 


(a) t: R? — R3 (b) t: R? — R? 

(x,y) +> (x,y, £ + y) (x,y) + (3a, 4r + y) 
(c) t: P3 — P 

p(z) > p'(z) 


Systems of linear equations 


You will now see how we can use linear transformations to obtain 


information on the number of solutions of a system of linear equations. 


Suppose that we want to know how many solutions there are to the 
following system of three linear equations in three unknowns: 
20+ 3y+4z=7 
e+ d5y+6z=4 
3a + 2y+5z=1. 


This system can be written in matrix form as 


2 3 4 x 7 
1 5 6 y)= {4 
3 2 5 z 1 


t: R? — R? 


T 23 4 T 
yJro{tl1 5 6 Yy 
zZ 3 2 5 Z 


4 


Image and kernel 


263 


Unit C3 Linear transformations 


264 


We see that (x,y,z) is a solution of the system of equations precisely when 
in, 2) = C74, 1). 
Thus the number of solutions of the system of equations is the same as the 


number of vectors in R? that map to the vector (7,4, 1) under t. 


In general, suppose that we want to know how many solutions there are to 
the system of m linear equations in n unknowns with the matrix equation 


Ax =b. 
Let t be the linear transformation with the matrix representation 
t: R” — R” 
x —> Ax. 


Then the number of solutions of the system of equations is the same as the 
number of vectors that map to b under t. 


Suppose b € Imt. Then there is some vector a € R” such that t(a) = b. 
Then, using the Solution Set Theorem (Theorem C51), the solution set to 
the system of equations is 


{x:x =a +k for some k € Kert}. 


Now Kert is a subspace of R”, by Theorem C50. A subspace of R” of 
dimension 0 comprises just the zero vector. A subspace of R” of dimension 
greater than 0 comprises infinitely many vectors since it is a line, a plane 
or a higher-dimensional space. So Ker t contains either just the zero vector 
or infinitely many vectors. 


It follows that there are three possibilities for the number of solutions: 
e if b € Imt and Kert = {0}, then there is exactly one solution 

e if b € Imt and Kert Æ {0}, then there are infinitely many solutions 
e if b ¢ Imt, then there are no solutions. 


Thus a system of linear equations has no solutions, or one solution, or 
infinitely many solutions. This result was stated without proof in 
Subsection 1.2 of Unit C1. 


Exercise C113 


How many solutions are there to the following system of three linear 
equations in three unknowns? 


e+ 2y+3z=1 
x + z=1 
r+ y+2z=1 


Use your solutions to Exercises C105(c) and C108(b). 


By considering the linear transformation 
t: R” — R” 
x —> Ax, 


4 Image and kernel 


we can show that the number of solutions of the system Ax = b of 
m linear equations in n unknowns depends on the values of m and n. We 
consider three cases: n > m, n < m and n = m. 


Case (a): n >m 
It follows from Theorem C54 that Kert # {0}. Thus the equation Ax = b 
has either no solution (if b ¢ Imt) or infinitely many solutions (if 
b € Imt). For example, the system 

2r+ y+ z=a 

4z + 2y + 2z = b, 
of two equations in three unknowns has either no solution or infinitely 
many solutions, depending on the values of a and b. For example, the 
system has no solution when a = 3 and b = 4, and infinitely many 
solutions when a = 2 and b = 4. 


Case (b): n < m 


It follows from Theorem C54 that Int Æ R™. Thus there is some b for 
which the equation Ax = b has no solutions. For example, there are some 
values of a, b and c for which the system 


2r+ y=a 
z+ 3y =b 
4g + y=c, 


of three equations in two unknowns has no solutions. For example, the 
system has no solutions when a = 3, b= 4 and c= 2. 


Case (c): n =m 
It follows from Theorem C54 that there are two possibilities. 
If Kert = {0} and Imt = R”, then the equation Ax = b has exactly one 
solution for each b. For example, the system 

r+y=a 

y=), 

of two equations in two unknowns has exactly one solution, namely 
(x,y) = (a — b,b), for each pair of values (a,b). 


If Kert £ {0} and Imt Æ R”, then there exist vectors b for which the 
equation Ax = b has no solutions, and for all other b, the equation 
Ax = b has infinitely many solutions. Consider the system 


xr+2y=a 
2x + 4y = b, 
of two equations in two unknowns. Since 2x + 4y = 2(x + 2y), these 
equations have no solution when b Æ 2a. When b = 2a, however, putting 
y = k gives (x,y) = (a — 2k, k), where k € R, as a solution of the 
equations; thus there are infinitely many solutions. 


265 


Unit C3 Linear transformations 


266 


We summarise these results below. 


Theorem C55 

Let Ax = b be a system of m linear equations in n unknowns. 

(a) Ifn>m, then Ax = b has either no solution or infinitely many 
solutions. 

(b) If n <m, then there is some b for which Ax = b has no 
solutions. 

(e) Ii m= m then: 
e either Ax = b has exactly one solution for each b 


e or there are some b for which Ax = b has no solutions; 
for all other b, Ax = b has infinitely many solutions. 


Exercise C114 


What can you deduce from Theorem C55 about the number of solutions of 
each of the following systems of linear equations? 

3r+ y+ z=a 
(b) 4r + 2y + 4z =b 

5r + y+6z=c 


3zr+ y+ z=1 


(a) 4x + 2y+4z=3 


Summary 


In this unit you have seen that linear transformations are functions 
between vector spaces that preserve linear combinations of vectors, and 
that for finite-dimensional vector spaces they are precisely the functions 
that have matrix representations. Using properties of matrices you have 
investigated invertible linear transformations. You have seen that 
finite-dimensional vector spaces are isomorphic if and only if their 
dimensions are equal, and hence that all vector spaces of dimension n are 
isomorphic to R”. You have met the Dimension Theorem, the important 
result that the sum of dimensions of the image set and kernel are equal to 
the dimension of the domain. In addition, you have seen that linear 
transformations can be used to prove that matrix multiplication is 
associative and to help determine the number of solutions of a system of 
linear equations. 


Learning outcomes 


Learning outcomes 


After working through this unit, you should be able to: 


e explain what is meant by a linear transformation and understand that 
linear transformations preserve the zero vector and linear combinations 
of vectors 


e recognise simple linear transformations of the plane 
e determine whether or not a given function is a linear transformation 


e understand that the matriz representation of a linear transformation 
t: V —> W depends on the bases used for V and W 


e find the matrix representation, with respect to given bases, of a linear 
transformation between finite-dimensional vector spaces 


e understand the relationship between matrices and linear transformations 


e use the matrix representations of two given linear transformations s and 
t to find a matrix representation of the composite function s o t 


e determine whether a given linear transformation is invertible and, if it is, 
find its inverse 


e understand that each n-dimensional vector space is isomorphic to R” 


e explain the meaning of the terms image set and kernel of a linear 
transformation 


e find a basis for the image set of a given linear transformation and find 
the kernel of a given linear transformation 


e understand the relationship between the dimension of the image set, the 
dimension of the kernel and the dimension of the domain of a linear 
transformation 


e understand that the number of solutions of a system of m linear 
equations in n unknowns depends on the values of m and n. 


267 


Unit C3 Linear transformations 


Solutions to exercises 


Solution to Exercise C82 
(a) This is a (2,3)-scaling. 


y 


EON 
þat” 
t Ñ 
(b) This is qo, a reflection in the z-axis; it is also a 


(1, —1)-scaling. 


y 


(2,1) 


CAD 


Solution to Exercise C83 


(a) First t(0) = 0, so t may be a linear 
transformation. 


Next we check whether t satisfies LT1: 
t(vi + v2) =t(v1) + t(v2), for all vı, vz € RÊ 
In R?, let vi = (#1, y1) and v2 = (x2, y2). Then 
t(vi + v2) = t(z1 + £2, y1 + y2) 


= (a1 + z2 + 3(y1 + y2), y1 + Y2) 
= (£1 + to + 3y1 + 3y2, Y1 + Y2) 


and 


t(v1) + t(v2) = (£1 + 3y1, yt) + (x2 + 3y2, y2) 
= (£1 + T2 + 3y1 + 3y2, Y1 + Y2). 


These expressions are equal, so LT1 is satisfied. 


268 


Finally, we check whether t satisfies LT2: 


t(av) =at(v), forall v €R?, a€R. 


Let v = (x,y) be a vector in R? and let a € R. 
Then 


t(av) = t(az, ay) = (ax + 3ay, ay) 
and 

at(v) = a(x + 3y, y) = (ax + 3ay, ay). 
These expressions are equal, so LT 2 is satisfied. 


Since both LT1 and LT2 are satisfied, t is a linear 
transformation. 


(b) Since t(0) = t(0,0) = (2,1) £O, it follows 
from Strategy C14 that t is not a linear 
transformation. 


Solution to Exercise C84 
We use Strategy C14. 


(a) First t(0) = 0, so t may be a linear 
transformation. 


Next we check whether t satisfies LT1: 
t(vy + v2) = t(v1) +t(ve), for all vı, v2 € RÊ? 
In R?, let vı = (21, y1) and v2 = (z2, y2). Then 


t(vi + v2) = t(z1 + 22, y1 + Y2) 


=f 
= (£1 + £2, Y1 + Y2, T1 + T2, Y1 + Y2) 
and 


t(v1) + t(v2) = (£1, Y1, £1, Y1) + (£2, yo, £2, Y2) 
= (x1 + 2, Y1 + Y2, T1 + T2, Y1 + Y2). 


These expressions are equal, so LT1 is satisfied. 


Finally, we check whether t satisfies LT2: 
t(av) =at(v), forall vEeR?, a€R. 


Let v = (x,y) be a vector in R? and let a € R. 
Then 


t(av) = t(az, ay) = (az, ay, ax, ay) 
and 
at(v) = a(x, Y, T, y) = (ax, ay, QT, ay). 


These expressions are equal, so LT2 is satisfied. 


Since both LT1 and LT2 are satisfied, t is a linear 
transformation. 


(b) First t(0) = 0, so t may be a linear 
transformation. 


Next we check whether t satisfies LT1: 
t(v, + v2) = t(v1) + t(v2), for all vı, v2 € RÌ 
In RÌ, let vi = (21, y1, 21) and və = (x2, Y2, 22). 
Then 
t(vy + V2) = t(£1 + 22, y1 + Y2, 21 + 22) 
= (x1 + r2)? 
= r? + r + 212 
and 
t(vi) + t(v2) = z? + 22. 
These expressions are not equal in general, so LT1 
is not satisfied. 
Thus t is not a linear transformation. 


(c) Since ¢(0) = t(0,0,0) = (0,0,0,1) 40, it 
follows that t is not a linear transformation. 


Solution to Exercise C85 


First we show that t satisfies LT1: 

t(vi + v2) = t(v1) + t(v2), for all vı, v2 € RÌ? 
In R, let vı = (#1, y1, 21) and v2 = (£2, yo, 22). 
Then 

t(vi + V2) 

= t(@1 + £2, y1 + Y2, 21 + 22) 

= ((r1 + £2) cos 0 — (yı + yo) sin 9, 

(xı + x2) sin + (yı + y2) cos 0, z1 + 22) 


and 
t(v1) + t(v2) 
= (xı cos — yı sin 8, xı sin 8 + yı cos 0, 21) 
+ (x2 cos 0 — y2 sin 0, x2 sin 6 + y2 cos 0, 22) 
= ((41 + 2) cos — (yı + y2) sin 9, 
(x1 + £2) sin + (yı + y2) cos 0, z1 + 22). 


These expressions are equal, so LT1 is satisfied. 
Next we show that t satisfies LT2: 
t(av) =at(v), for allv €R’, a€R. 


Let v = (x,y,z) be a vector in R? and let a € R. 
Then 


Solutions to exercises 


t(av) = t(az, ay, az) 
= (axcos@ — ay sin 0, ax sin 0 + ay cos 8, az) 


and 


at(v) = a(xcos 6 — ysin 0, x sin 8 + ycos 6, z) 
= (az cos 0 — ay sin 0, ax sin 0 + ay cos 0, az). 


These expressions are equal, so LT2 is satisfied. 


Since LT1 and LT2 are satisfied, t is a linear 
transformation. 


Solution to Exercise C86 
We use Strategy C14. 


Since the zero element of P3 is p(x) = 0, we have 
p(2) = 0 and thus ¢(0) = 0; so t may be a linear 
transformation. 


Next we check whether t satisfies LT1: 


t(p(2) + q(x)) = t(p(@)) + t(q(2)), 
for all p(x), q(x) € P3. 


Let p(x), q(x) € P3. Then 
t(p(2) + q(x) = p(x) + g(x) + p(2) + (2) 
and 
t(p(@)) + t(q()) = p(x) + p(2) + a(x) + q(2) 
= p(x) + g(x) + p(2) + (2). 
These expressions are equal, so LT1 is satisfied. 


Finally, we check whether t satisfies LT2: 


t(ap(x)) =at(p(x)), for all p(x) € P3, a €R. 
Let p(x) € P3 anda € R. Then 

t(ap(x)) = ap(x) + ap(2) 
and 

at(p(x)) = a(p(x) + p(2)) = ap(x) + ap(2). 
These expressions are equal, so LT2 is satisfied. 


Since both LT1 and LT2 are satisfied, t is a linear 
transformation. 


Solution to Exercise C87 
First we show that iy satisfies LT1: 


iv(v1 + v2) = iv (v1) + iv (v2), 
for all v1, v2 E€ V. 


269 


Unit C3 Linear transformations 


Let v1, v2 E€ V. Then equating corresponding coordinates gives the 

; system 

iv(vi + v2) = vı + v2 

a+2b=3 

aad 2a+ b=1. 

ay (vi) ty Va) Sv Vo: Solving, we have a = -4 and b = 5, so 
These expressions are equal, so LT1 is satisfied. 7 15 

ve = (-3,3) p- 


Next we show that iy satisfies LT2: 
Solution to Exercise C90 


(a) Here E = {1,2}. Therefore, 
p(x) = 2 + 32 = 2 x (1) +3 x (x), 


iv(av) =aiy(v), foralvEV,aER. 
Let v € V and a € R. Then 


so the E-coordinate representation of p(x) is 
(2,3)z. 

(b) Here E = {1,4 + 62}. Therefore, 
p(x) = 2+ 32 =0x (1) + $ x (44 62), 


aiy(v) = av. 
These expressions are equal, so LT2 is satisfied. 


Since LT1 and LT2 are satisfied, t is a linear 


transformation. so the F-coordinate representation of p(x) is 
7 7 1 
Solution to Exercise C88 (02)s: 
(c) Here E = {2x,1 + 4r}. We must find a,b € R 


We can write any vector (x,y) in R? in the form 
(x,y) = z(1,0) + y(0, 1). 
It follows from Theorem C39 that 


qelz, y) = ao(2(1, 0) + y(0, 1)) 
= xqg(1, 0) + yq (0, 1) 


such that 
p(x) = 2+ 3x = (a,b) p. 
Since 


(a,b)g =a x 2x +b x (14+ 4r) 


= x(cos 2¢, sin 2d) + y(sin 2¢, — cos 2¢) = b+ (2a + 4b)x, 
= (x cos 2 + ysin 2¢, x sin 2¢ — y cos 2¢). equating corresponding coefficients gives the 
. 3 system 
Solution to Exercise C89 _ 
(a) Here E = {(3,1), (2,1)}. Therefore, 2a + 4b = 3. 
v = (3,1) = 1(3,1) + 0(2, 1), Solving, we have a = -5 and b = 2. Thus the 
a6 E-coordinate representation of p(x) is 
5 
vg = (1,0) e- (=o) 2 hee 
(b) Here E = {(1,2),(2,1)}. We must find Solution to Exercise C91 
a,b E€ R such that 
(a) We have 
a1) = (a, Bin, 
(3,1) = aa i sey ve 
Since Gy: 0.2) 0) voy? 
(a,b) = a(1,2) + b(2,1) = (a + 2b, 2a + b), so t(1,0) = (3,0). 


270 


Similarly, 


A (3 0) (0) _ (0 
1 0 2 1) 2? 
so t(0, 1) = (0,2). 
Thus the coordinates of t(1,0) form the first 


column of the matrix of t, and the coordinates of 
t(0, 1) form the second column of the matrix of t. 


(b) We have 


Similarly, 
0 + -4\ 2 
Ga OR 
v2 v2 v2 


— OE A al 
SO t(0, 1) (-4.5) z011). 
As in part (a), the coordinates of t(1,0) form the 


first column of the matrix of t, and the coordinates 
of t(0, 1) form the second column of the matrix of t. 


Solution to Exercise C92 
We use Strategy C15. 
(a) We find the images of the vectors in the 
domain basis E = { (1,0), (0, 1)}: 

41,0) = (10) 2(0,1) = (3,1); 
We find the F-coordinates of each of these image 
vectors, where F = {(1,0), (0,1)}: 

t(1,0) = (1,0)F, t(0, 1) = (3, 1)r. 


Hence the matrix of t with respect to the standard 
bases for the domain and codomain is 


aG) 


Thus the matrix representation of t with respect to 
these bases is 


P aoii 


(b) We find the images of the vectors in the 
domain basis E = {1,2,27}. The first basis vector 
is the constant polynomial pols) = 1, for which 


Solutions to exercises 


po(2) = 1. The second basis vector is pı (x) = x, for 
which p;(2) = 2; and the third basis vector is 
p2(x) = x, for which p2(2) = 4. Thus 
t(1)=141=2, tz) =24+2, 
ie Se +2 =r +4. 
We find the F-coordinates of each of these image 
vectors, where F = {1, x, z?}: 


t(1) = (2,0,0)r, t(x) = (2,1,0)F, 
t(x?) = (4,0, 1)F. 
Hence the matrix of t with respect to the standard 
bases for the domain and codomain is 
2 2: A 
0 1 0 
0 0 1 


A= 


Thus the matrix representation of t with respect to 
these bases is 


a 224 a 2a + 2b+ 4c 
bJr>{10 1 0 b| = b 
c 0 0 1 C c 


(Notice that 
t(a + bx + cx”) = (a + bx + cx”) + (a + 2b +220) 
=a+t+be+cx*+a+2b+4e 
= 2a + 2b + 4c + bx + cx.) 


(c) We find the images of the vectors in the 
domain basis FE = {(1,0), (0, 1)}: 


t(1,0) = (1,0,1,0), £(0,1) = (0,1,0,1). 


We find the F-coordinates of each of these image 
vectors, where 


F= {(1,0, 0,0), (0, 1,0,0), (0,0, 1,0), (0,0,0, 1)}: 
t(1,0) = (1,0,1,0)F, 000.1) = (0,1,0, 1)F. 


Hence the matrix of t with respect to the standard 
bases for the domain and codomain is 


1 0 
0 1 
a= ja G 


0 1 


Thus the matrix representation of t with respect to 
these bases is 


1 0 x 

x 0 1 z\ |y 
Ge 1 0 T x 
0 1 yY 


271 


Unit C3 Linear transformations 


(d) We find the images of the vectors in the 
domain basis Æ = {(1,0,0), (0,1, 0), (0, 0, 1)}: 
t(1,0,0) = (1,0), #(0,1,0) = (0,1), 

t(0,0,1) = (0,0). 


We find the F’-coordinates of each of these image 
vectors, where F = {(1,0), (0,1)}: 


t(1,0,0) = (1,0)r, 4(0,1,0) = (0,1) 
t(0,0,1) = (0,0) r. 


Hence the matrix of t with respect to the standard 
bases for the domain and codomain is 


10 0 
A=(j 1 a 


Thus the matrix representation of t with respect to 


( ) 
y l 


7 _, (1 0 0 
, 010 
Zz 
Solution to Exercise C93 
We use Strategy C15. 
(a) We find the images of the vectors in the 
domain basis Æ = {(1,0,1), (1,0, 0), (1, 1, 1)}: 
61,0, 1=(1,0);, £1,0,0) = (1,0), 
#(1,1,1) = (1,1). 
We find the F’-coordinates of each of these image 
vectors, where F = {(1,0), (0,1)}: 
(1,0; 1)= (1, 0)e, . 41,00) = (1,007, 
¢(1, 1, 1) = (1, 1)p. 


Hence the matrix of t with respect to the bases Æ 
and F is 


1 1 l 

az e 0 ' 
(b) We find the images of the vectors in the 
domain basis Æ = { (1,0,0), (0,1,0), (0, 0, 1)}: 

t(1,0,0) = (1,0), Ł(0,1,0) = (0,1), 

t(0,0, 1) = (0,0). 
We find the F-coordinates of each of these image 
vectors, where F = {(2,1), (1,1)}. 


For the first image vector we need a,b € R such 
that 


(1,0) = (a,b) F. 


272 


Since 
(a, b)F = a(2, 1) a b(1, 1) = (2a +b,a + b), 


by equating coordinates we see that a = 1 and 
b = —1, so (1,0) = (1,—1)r. Therefore 


1,00) ti ir: 


For the second image vector we need c,d € R such 
that 


(0,1) = (c, d)F. 
Since 
(c,d)r = c(2,1) + d(1,1) = (2c + d,c + d), 


by equating coordinates we obtain the system 


2c+d=0 
c+d=1. 
Solving, we have c = —1 and d = 2, so 


(0,1) = (—1,2)r. Therefore 


Finally, for the third image vector we need e, f € R 
such that 


(0,0) = (e, fr. 


Using the same method as before we have 
e = f =0, so (0,0) = (0,0). Therefore 


t(0,0,1) = (0,0) x. 
Hence the matrix of t with respect to the bases Æ 
and F is 
1 -1 0 
a & 2 o) l 


(c) We find the images of the vectors in the 
domain basis Æ = { (0, 1,0), (1,1,1), (0,1,1)}: 


t(0,1,0)=(0, 1). 201,1,1) = (1.1), 
K010 = (0,1), 


We find the F-coordinates of each of these image 
vectors, where F = {(1,3), (2, 4)}. 


For the first image vector we need a,b € R such 
that 


(0,1) = (a,b) pr. 
Since 


(a,b) rp = a(1,3) + b(2, 4) = (a + 2b, 3a + 4b), 


by equating coordinates we obtain the system 


a+2b=0 
3a + 4b = 1. 


Solving, we have a = 1 and b = —5, so 
(0,1) = (1.4) 
Therefore 
£(0,1,0) = (1,-3) p- 
For the second image vector we need c,d € R such 
that 


(1, 1) = (c, d)F. 
Since 
(c, d)r = c(1,3) + d(2, 4) = (c + 2d, 3c + 4d), 


by equating coordinates we obtain the system 


c+2d=1 
3c + 4d = 1. 
Solving, we have c = —1 and d= 1, so 


(1,1) = (-1,1)r. Therefore 
A151, 1) = (—1, 1)r. 

Since t(0,1,1) = (0,1) = Ł(0, 1,0), we have 
t(0,1,1) = (1,—4) p- 


Hence the matrix of t with respect to the bases Æ 
and F is 


a- 1 =i ) 
-} 1-4 


Solution to Exercise C94 
We use Strategy C15. 


(a) We find the images of the polynomials in the 
domain basis E = {1, x, x7}: 
t(1)=0, t(c)=1, t(x?) = 2c. 


We find the F’-coordinates of each of these image 
vectors, where F = {27,1+ z}. 


For the first image vector we have 
il) =0= (0,0) p: 


For the second image vector we need a,b € R such 
that 


1 = (a,b) p. 


Solutions to exercises 


Since 
(a,b)p =a x (22) +b x (14+ 2) 
= b+ (2a + b)z, 
by equating coefficients we obtain the system 
b=1 
2a+b=0. 
Solving, we have a = -4 and b = 1, so 


1 = (—3, DA Therefore 
H= (—3, Ija 

For the final image vector we have 
t(¢”) = 2x = (1,0)p. 


Hence the matrix of t with respect to the bases Æ 
and F is 


iapa l 
= o 10 


Thus the matrix representation of t with respect to 
the standard basis Æ and non-standard basis F is 


a a T 

r [0 2 1! _ (-abte\ 
0 10 b 

C cjg F 


(b) We find the images of the polynomials in the 
domain basis E = {x, x”, 1}: 


t(z)=1, 16°)=2, t(1)=0. 


We find the F-coordinates of each of these image 
vectors, where F = {2x,1 + z}. 
We know from part (a) that 
t(x) = (-z 1) p> 
t(x?) = (1,0)F, 
t(1) = (0,0) r. 


Hence the matrix of t with respect to the bases Æ 
and F is 


iaat 
7 1 0 O/}° 


Thus the matrix representation of t with respect to 
the non-standard bases FE and F is 


a a 1 

' T 1 7 x (e 
100 a 

C C E F 


273 


Unit C3 Linear transformations 


Solution to Exercise C95 


The functions in parts (a) and (d) are linear 
transformations since they are of the form 


t: R? — R? 
(x,y)! 


for some a,b,c,d € R. 


> (ax + by, cx + dy), 


The functions in parts (b) and (c) are not linear 
transformations since they are not of this form. 


Solution to Exercise C96 
(a) We have 
r(p(z,y)) = ra +y, =x) 
= (3x + y, (3z + y) — 2) 
= (3z + y, 2x + y). 
Thus r o p is given by 
rop:R? — R? 
(x,y) > (3z +y, 2x +y). 
(b) We have 
p(r(z,y)) = p(z, £ +y) 
= (3x + (x +y), -2) 
= (4x + y, —2). 
Thus po r is given by 
por :R? — R? 
(x,y) — (4z +y, -2). 


Solution to Exercise C97 


It follows from the Composition Rule that the 
matrix of s o t with respect to the standard bases 
for the domain and codomain is 


2. 1 4 1 4 Il 
Or epa 6 
1 0 102 4 


Thus the matrix representation of s ot with respect 


to the standard bases for the domain and 


274 


codomain is 


sot: Rt — R? 


Eres 
=. A A 
O Ne 
No Ae 
x e R 


w 


4x +y+4z+1lw 
= 4x + 2y + 6w 
z + 2z + 4w 


Solution to Exercise C98 


It follows from the Composition Rule that the 
matrix of s o t with respect to the standard bases 
for the domain and codomain is 


o1 {2 2 4 
0 0 2 


01 o)=(2 3 2). 
001 

Thus the matrix representation of sot with respect 

to the standard bases for the domain and 

codomain is 


sot: P — Py 


a a 
0 1 0 b 
A oe € 0 s) 2 Na (:.). 
C/ m C/E 
As expected, this is the same as the matrix 
representation for s. 


Solution to Exercise C99 
Since 
s(t(x, y)) 
= s(4x — y, —3x + y) 
= ((4x — y) + (—3x + y), 3(4x — y) + 4(—32 + y)) 


= (x,y) 
and 
t(s(x,y)) 
= t(x +y, 3x + 4y) 
= (4(x + y) — (3a + 4y), —3(a + y) + (3x + 4y)) 
= (x,y), 


for each vector (x,y) in R?, s is the inverse 
function of t. 


Solution to Exercise C100 


(a) Since t is a linear transformation between two 
vector spaces of the same dimension, we use 
Strategy C16. 


First we find a matrix representation of t. We have 
i 1,0) (2.4), 2(0, 1) = (1, 2) 


Hence the matrix representation of t with respect 
to the standard bases for the domain and 
codomain is 


() > (4 3) G) = (array): 


Next we evaluate the determinant of the matrix 


2 1 
A=(; J): 
We have 


2 1 


ata = | 5 


[=4-4=0. 


Since det A = 0, t is not invertible. 


(b) Since t is a linear transformation between two 
vector spaces of the same dimension, we use 
Strategy C16. 


First we find a matrix representation of t. We have 
t(1,0) = (1,3), #(0,1) = (—1,1). 


Hence the matrix representation of t with respect 
to the standard bases for the domain and 


codomain is 
x = 1 -l A E 
y 3 1} \y)]) \83r+y)` 


Next we evaluate the determinant of the matrix 
H 

We have 

1 —1 

3 1 

Since det A Æ 0, t is invertible. 


ata =l TI =1- 634 


We now find the inverse function of t, 
tt : R? — R?. According to Strategy C16, t7! 
has the matrix representation v —> A7!v, with 


Solutions to exercises 


respect to the standard bases for the domain and 
codomain. Since 


eas) 


it follows that t~! has the matrix representation 


Ale Ble 
Se 


So t~! is the linear transformation 
tt: R? — R? 
(x,y) — (Fa + Fy, $a + Fy). 
(c) Since ¢ is a linear transformation between two 


vector spaces of the same dimension, we use 
Strategy C16. 


First we find a matrix representation of t. We have 
t(1, 0,0) = 2,=1,0), t(0, 1,0) = (0, 3,0), 
t(0,0,1) = (0,0, 1). 

Hence the matrix representation of t with respect 


to the standard bases for the domain and 
codomain is 


£ 2 00 T 2r 
y | = | -1 3 0 y| = |3y-2 
z 0 0 1 z z 
Next we evaluate the determinant of the matrix 
2 0 0 
A= {-1 3 0 
0 0 1 
We have 
2 
0 0 3 0 
det A= |-1 3 0) = 2 lo 1170+00 
00 1 
=2x 3° = 6, 


Since det A Æ 0, t is invertible. 


We now find the inverse function of t, 

tt : R? — R®. According to Strategy C16, t~! 
has the matrix representation v > A7!v, with 
respect to the standard bases for the domain and 
codomain. 


275 


Unit C3 Linear transformations 


Using row-reduction from Unit C1, we find 


1 

1 9 0 
—-1_7fi1 1 
AW“=|6 3 OF, 

001 


so t~! has the matrix representation 


1 
y | — ł i 0 y| = tx + ty 
z 0 0 1 z z 


So t~+ is the linear transformation 
+: R? — R? 
(x,y,z) —> (52, a + $Y, 2) 
(d) Since t is a linear transformation between two 


vector spaces of different dimensions, it follows 
from Corollary C46 that t is not invertible. 


Solution to Exercise C101 
The linear transformation 
t: Pa — R? 
a+ br + cx? ++ (a,b,c) 


is one-to-one and onto and hence invertible. It is 
therefore an isomorphism. 


(There are many other possibilities. ) 


Solution to Exercise C102 


The vector spaces R?, C and P> are isomorphic, 
since they are all two-dimensional. 


The vector spaces R? and P3 are isomorphic, since 
they are both three-dimensional. 


Solution to Exercise C103 


(a) The image set of this linear transformation is 
the x-axis. This is a subspace of the codomain. 


(b) The image set of this linear transformation is 


the line y = x. This is a subspace of the codomain. 


Solution to Exercise C104 
We have 


t(1,0) = (1,1), 20,1) = (0,0): 


276 


The image set of t is the line y = zx; that is, 
Imt = {(k,k):k €R}. 
Thus Imt is spanned by (1,1) = t(1, 0). 


Solution to Exercise C105 
We use Strategy C17. 


(a) We take the standard basis {(1, 0), (0,1)} for 
the domain R?. 


We determine the images of these basis vectors: 
¢(1,0) = (1,2), (0,1) = (0,1). 

The set {(1,2), (0,1)} is linearly independent, so it 

is a basis for Imt. 


Since the basis has two elements, 
dim(Im t) = 2. 
(b) We take the standard basis {1, x, x7} for the 
domain P3. 
We determine the images of these basis vectors: 
t(1)=0, 


t(z)=1, t(x?) =2e. 


The set {0,1, 2x} is not linearly independent since 
it contains the zero vector. We discard 0 to give 
the set {1, 2x}. 


The vectors 1 and 22 are linearly independent, so 
{1, 2x} is a basis for Imt. 


Since the basis has two elements, 
dim(Im t) = 2. 
(c) We take the standard basis 
{(1,0,0), (0, 1,0), (0,0,1)} for the domain R. 
We determine the images of these basis vectors: 
t100) =I 401,0) = (2,0,1), 
4(0,0,1) =(3;1;2): 
The set {(1,1, 1), (2,0, 1), (3,1,2)} is not linearly 
independent. In fact, 
(3,1,2) = (1,1,1) + (2,0,1), 
so we discard (3,1,2) to give the set 
(1:1:1) 12.0, 1%. 
The vectors (1, 1,1) and (2,0, 1) are linearly 


independent, so {(1,1,1),(2,0,1)} is a basis for 
Imt. 


Since the basis has two elements, 
dim(Im t) = 2. 


(You may have chosen to discard (1, 1,1) or (2,0, 1) 
instead. This would still give a correct answer.) 


Solution to Exercise C106 


(a) We know from Exercise C105(a) that 
dim(Imt) = 2. Thus Imt is the whole of the 
two-dimensional codomain R?; so t is onto. 


(b) We know from Exercise C105(b) that 
dim(Imt) = 2. Thus Imt is the whole of the 
two-dimensional codomain Py; so t is onto. 


(c) We know from Exercise C105(c) that 
dim(Imt) = 2. Thus Imt is not the whole of the 
three-dimensional codomain R3; so t is not onto. 


Solution to Exercise C107 


(a) For this linear transformation, t(x,y,z) = 0 if 
and only if (x,0) = (0,0), that is, if and only 

if x = 0. Thus the kernel of t is the (y, z)-plane. 
This is a subspace of the domain Rè. 


(b) For this linear transformation, t(x, y) = 0 if 
and only if (x, x) = (0,0), that is, if and only 

if x = 0. Thus the kernel of t is the y-axis. This is 
a subspace of the domain R?. 


Solution to Exercise C108 


(a) The kernel of t is the set of vectors (a, y) in R? 
that satisfy 


t(x, y) = 0, 
that is, 
(x, 2x +y) = (0,0). 
Equating coordinates, we obtain the system 


x =0 
2r +y=0. 


Substituting x = 0 from the first equation into the 
second equation, we obtain y = 0. 


So the kernel of t is 
Kert = {(0,0)}. 


Solutions to exercises 


Since this contains only the zero vector, 
dim(Kert) = 0, 


that is, Kert is a zero-dimensional subspace of the 
domain R?. 


(b) The kernel of t is the set of vectors (x, y, z) 
in R? that satisfy 


t(x, Y, z) = 0, 
that is, 
(x + 2y +3z,x +z,x +y + 2z) = (0,0,0). 


Equating coordinates, we obtain the system 


z+ 2 +3z=0 
x + z=0 
x y+2z=0. 


To solve this system we row-reduce the augmented 
matrix. 


rı 1 2 3/0\ 6 
r2 1 0 1/0] 2 
r3 1 1 2/07 4 
1 2 3 | 0 6 
C5: Fo-— Fj 0 -2 -—2/0 —4 
rs => r3 — Ti 0 —1 -1/0 —2 
1 2 310 6 
r2 > —$r2 0 1/0} 2 
0 -1 -1]0/ -2 
rı > rı — 2r2 1 0 1/0\ 2 
0 1 be Oey 2 
r3 > r3 + r2 0 0 010/ 0 


The augmented matrix is in row-reduced form and 
we have 
t +2z=0 
ytz=0. 


Assigning the parameter k to the unknown z, we 
obtain 


f=—h, y==k, 2=h. 
So the kernel of t is 
Kert = {(—k, —k,k) : k € R}, 


that is, Kert is the line through (0,0,0) and 
(1-11); 


277 


Unit C3 Linear transformations 


Thus 
dim(Kert) = 1, 
that is, Kert is a one-dimensional subspace of the 
domain R3. 
Solution to Exercise C109 


Let p(x) = a+ bx + cx? be a polynomial in P3. 
Then 


t(p(a)) = b + 2cz. 


The kernel of t is the set of polynomials 
p(x) = a + bz + ca? in P that satisfy 


t(p(x)) = 0, 
that is, 
b+ 2cx = 0. 


Equating coefficients, we obtain the system 


b =0 
26=(). 


So a can take any real value, b = 0 and c= 0. 
Thus the kernel of t is 

Kert = {p(x) : p(x) =a, a € R}, 
that is, the set of constant polynomials. 


A basis for this subspace (the kernel) is {1}, so it 
follows that 


dim(Kert) = 1. 


Solution to Exercise C110 


(a) The kernel of t is Kert = {0}. Thus t is 
one-to-one. 


(b) The kernel of t is Kert # {0}. Thus t is not 


one-to-one. 


(c) The kernel of t is Kert Æ {0}. Thus t is not 
one-to-one. 


Solution to Exercise C111 


(a) For the linear transformation 
t: R? — R? 
(x,y) —> (a, 2a +y), 


278 


we found in Exercise C105(a) that dim(Imt) = 2, 
and in Exercise C108(a) that dim(Kert) = 0. Thus 


dim(Im t) + dim(Kert) = 2+ 0 = 2, 
which is the dimension of the domain R?. 
(b) For the linear transformation 

t: P. — Py, 

p(x) + p'(z), 


we found in Exercise C105(b) that dim(Imt) = 2, 
and in Exercise C109 that dim(Kert) = 1. Thus 


dim(Im t) + dim(Kert) = 2 + 1 = 3, 
which is the dimension of the domain P3. 
(c) For the linear transformation 
t: R? — R? 
(x,y,z) => (x + 2y +3z,x£ +z,x£ +y +22), 


we found in Exercise C105(c) that dim(Im t) = 2, 
and in Exercise C108(b) that dim(Ker t) = 1. Thus 


dim(Imt) + dim(Ker t) = 2 + 1 = 3, 


which is the dimension of the domain R?. 


Solution to Exercise C112 


(a) In this case, the dimension of the codomain 
(which is 3) is greater than the dimension of the 
domain (which is 2), so ¢ is not onto. 


(b) In this case, the codomain and the domain 
both have dimension 2. There are two possibilities: 
either t is both one-to-one and onto, or t is neither 
one-to-one nor onto. 


(c) In this case, the dimension of the codomain 
(which is 2) is less than the dimension of the 
domain (which is 3), so t is not one-to-one. 


Solution to Exercise C113 


The number of solutions of this system of 
equations is the same as the number of vectors that 
map to (1,1,1) under the linear transformation 


t: R? — R? 
(£;y,2) 


We know from the solution to Exercise C105(c) 
that (1,1,1) is in the image set of t, and from 


> (x + 2y + 3z,x +2z,£ +y +22). 


Exercise C108(b) that Kert 4 {0}. Thus the 
system of equations has infinitely many solutions. 


Solution to Exercise C114 


(a) This is a system of two linear equations in 
three unknowns. Since 3 > 2, the system has either 
no solutions or infinitely many solutions. 


(b) This is a system of three linear equations in 
three unknowns. There are two possibilities: 


e the system has exactly one solution for each set 
of values of a, b and c 

e there are some values of a, b and c for which the 
system has no solutions; for all other values of a, 
b and c, the system has infinitely many solutions. 


Solutions to exercises 


279 


Unit C4 
Eigenvectors 


1 Eigenvalues and eigenvectors 


Introduction 


By now you should be familiar with a wide variety of linear 
transformations from one vector space to another, and should appreciate 
that the matrix of a linear transformation depends on the bases chosen for 
the domain and codomain. In this final unit on linear algebra we 
concentrate on linear transformations from R? to R?, from R? to R and, 
more generally, from R” to R”, and address the following question. 


Is it possible to find a basis for both the domain and codomain so 
that the matrix of a linear transformation is a diagonal matrix? 


In the preceding units of this book you have studied vectors, matrices, 
vector spaces and linear transformations. The method for finding a 
diagonal matrix of a linear transformation (if such a matrix exists) links all 
these topics together. To round off the linear algebra topic, we use linear 
transformations and diagonal matrices to classify conics and quadrics. 


1 Eigenvalues and eigenvectors 


In this section you will see that some lines through the origin are mapped 
to themselves by some linear transformations from R? to R?: the 
individual points on these lines are usually moved, but, for a given line, all 
the points are scaled by a constant factor. You will see that this idea of 
fixed lines also applies to linear transformations from R? to R? and, more 
generally, from R” to R”. You will learn how determinants can be used for 
finding these fixed lines of linear transformations. 


1.1 What is an eigenvector? 


In Subsection 1.1 of Unit C3 Linear transformations you saw that a linear 
transformation t : R? —> R? moves the points of the plane around, but 
fixes the origin. Furthermore, parallel lines get mapped to parallel lines. In 
this section we will observe that t may map some lines through the origin 
onto themselves. These ‘unchanged’ lines are rather special. 


Consider the linear transformation t : R? — R? given by 


We know that t maps the origin (0,0) to itself, since this is a property of 
all linear transformations. 


We can calculate the image of the point (1,0): 
t(1,0) = (1+ (4x 0),1-— (2 x 0)) = (1,1). 


283 


Unit C4 Eigenvectors 


284 


Since linear transformations map lines through the origin to lines through 
the origin, t maps the line joining the points (0,0) and (1,0) to the line 
joining the points (0,0) and (1,1), as illustrated in Figure 1; that is, 


t maps the line y = 0 to the line y = a. 


YA YA 
= (1,1) 
(1,0) 
(0,0) z (0,0) z 


Figure 1 The image of the line y = 0 under the linear transformation t 


Let us now calculate the image of the point (1, —1): 
(1,1) = (1+ 4(-1),1 — 2(-1)) = (-3,3). 


In this case, the linear transformation t maps the line joining the points 
(0,0) and (1, —1) to the line joining the points (0,0) and (—3,3), as 
illustrated in Figure 2; that is, 


t maps the line y = —~ to itself. 


Although t moves individual points on the line (except (0,0)) to other 
points, the line as a whole is unchanged. 


(—3,3) Aj 
YA 
t 
> ——> 
(0, 0) ss 

(1, =1) > 

(0, 0) a 

Figure 2 The image of the line y = —z under the linear transformation t 


The image of the point (1,—1) under t is the point (—3,3) = —3(1, —1). 
The vector (1,—1) is scaled (stretched) by a factor of —3; that is, the 
resulting vector is three times the original magnitude and pointing in the 
opposite direction. In the next exercise you will investigate how other 
vectors lying along the line y = —zx are moved by t. 


Exercise C115 


For the above linear transformation t, calculate the images of the vectors 
(2,2) and (—7,7). What do you notice? 


1 Eigenvalues and eigenvectors 


We have seen that the linear transformation t scales some vectors lying 
along the line y = —a by the factor —3. In fact this is true of any vector 
lying along this line, as we now show. 


Let k be any real number, so that (k, —k) = k(1, —1) is a vector lying 
along the line y = —x. Then 


t(k, —k) = (k — 4k, k + 2k) = (—3k, 3k) = —3(k, —h), 


which shows that t has the same scaling effect on each vector (k, —k) lying 
along the line y = —2. 


Does the linear transformation t map other lines through the origin to 
themselves? 


Exercise C116 


(a) For the above linear transformation t, calculate t(0, 1), t(1,2) and 
t(4,1). 

(b) Use one of the solutions to part (a) to write down another line in R? 
that is mapped to itself by the linear transformation t. 


(c) Find t(4k, k). 


We have seen that the linear transformation t maps each of the lines 

y = —x and x = 4y to itself. In both cases, each vector along the line is 
moved to a scalar multiple of itself: each vector lying along the line y = —x 
is mapped to —3 times itself and each vector lying along the line x = 4y is 
mapped to 2 times itself. We call the non-zero vectors lying along the line 
y = —«x eigenvectors of t with corresponding eigenvalue —3; for example, 
(1,—1) and (—7,7) are eigenvectors of t with corresponding eigenvalue —3. 
Similarly, we call the non-zero vectors lying along the line x = 4y 
eigenvectors of t with corresponding eigenvalue 2; for example, (4,1) and 
(—8, —2) are eigenvectors of t with corresponding eigenvalue 2. 


More generally, we make the following definitions; here and throughout 
this unit we use V to denote a finite-dimensional vector space. 


Definitions 


Let t: V —> V be a linear transformation. An eigenvector of t is a 
non-zero vector v that is mapped by ¢ to a scalar multiple of itself; 
this scalar is the corresponding eigenvalue. 


In symbols, a non-zero vector v is an eigenvector of a linear 
transformation t if 


t(v) =Av, for some \ €R; 


A is the corresponding eigenvalue. 


285 


Unit C4 Eigenvectors 


David Hilbert 


Werner Heisenberg 


286 


We exclude the case v = 0, since ¢(0) = O for every linear 
transformation t. It is, however, possible for À to be 0: when A = 0, the 
linear transformation maps every vector corresponding to this eigenvalue 
to the origin — you will see an instance of this in Exercise C120. 


Eigen is a German word meaning own, characteristic or special. 
Another name for eigenvalue is characteristic value. 


The eigen terms are associated with the German mathematician 
David Hilbert (1862-1943) who first used the terms Eigenfunktion 
(eigenfunction) and Figenwert (Eigenvalue) in a series of papers on 
integral equations (1904-1910). It is possible that Hilbert was 
following the German physicist Hermann von Helmholtz (1821-1894) 
who used the term Figentöne in acoustics in the nineteenth century. 


In the 1920s the use of the eigen terminology was promoted through 
the development of the matrix mechanics formulation of quantum 
theory by the German physicist Werner Heisenberg (1901-1976) who 
wrote the new theory in the language of Hilbert and his followers. 


In the example above we found two lines that are mapped to themselves 
by t, by considering the images of various points. This is a rather 
hit-and-miss way of finding eigenvalues and eigenvectors. Before 
developing a general method for finding them, we see that it is sometimes 
possible to do so by considering the geometry of the transformation. 


Worked Exercise C62 


Let t : R? —> R? be the linear transformation that maps each point to its 
reflection in the x-axis. By considering the geometric features of t, 
determine as many eigenvectors of t as you can and write down the 
corresponding eigenvalue in each case. 


Solution 
Reflection in the x-axis maps each point (x,y) to the point (x, —y). 
@. A sketch can help. & 


YA 
(0, k) 


1 Eigenvalues and eigenvectors 


Exercise C117 


By considering the geometric features of each of the following linear 
transformations of the plane, determine as many eigenvectors as you can 
and write down the corresponding eigenvalue in each case: 


(a) reflection in the line y = x 
(b) 2-dilation 
(c) anticlockwise rotation through 7/2 about the origin 


(d) anticlockwise rotation through 7 about the origin. 


In Exercise C117 it is possible to spot the eigenvectors geometrically. We 
now illustrate a general method to determine the eigenvalues and 
eigenvectors for any given transformation. 


Consider again the linear transformation t : R? — R? given by 
t(x, y) = (x a 4y, vc 2y). 


We wish to find those vectors (x,y) that are mapped to scalar multiples of 
themselves; that is, 


t(x,y) = A(x, y) = (Aa, dy). 
We equate the expressions for t(x, y) and obtain 


(a + 4y, x — 2y) = (Az, Ay). 


287 


Unit C4 Eigenvectors 


288 


Equating the first and second coordinates of these vectors, we obtain the 
system of linear equations 


x+4y = Ax 
x — 2y = Ay. 


This is a system of two equations in the three unknowns x, y and À. One 
way of solving this system is to move the terms on the right to the 
left-hand side. Thus we obtain the system 
(1—A)z + 4y=0 
ys (1) 
z+ (-2—A)y =0. 


Equations (1) are called the eigenvector equations. We use them to find 
the possible values of A, and then to find all the eigenvectors that 
correspond to these values. They are homogeneous equations in x and y 
since the constant terms are all zero. 


Systems of homogeneous linear equations always have the trivial solution, 
in this case x = 0, y = 0, but this corresponds to the zero vector, which is 
excluded. Thus we seek non-zero solutions to the pair of homogeneous 
equations (1). Since we have two equations in three unknowns, such a 
system is bound to be dependant; that is, the homogeneous system has 
insufficient constraints on the unknowns to determine them uniquely. 


From Theorem C19, Summary Theorem, in Unit C1 Linear equations and 
matrices we know that a homogeneous system has only the trivial solution 
if and only if the determinant of the coefficient matrix is non-zero. The 
contrapositive of this tells us that non-zero solutions exist if and only if the 
determinant of the coefficient matrix is 0; that is, if and only if 


1-r 4 


i ag-am 


We expand the determinant and obtain 


A AiO mA 


which simplifies (after some algebra) to 
X +r-6=0. 


This equation is called the characteristic equation of t, and its solutions 
are the eigenvalues we seek. Notice that the characteristic equation, 
whether or not it is written in terms of a determinant, is a polynomial 
equation in À whose degree is the dimension of the domain of t — in this 
case 2. Here, we have 


Vth 6S = 2104-3) =0, 
so the eigenvalues are À = 2 and A = —3. 


To find the corresponding eigenvectors, we consider each eigenvalue A in 
turn. 


1 Eigenvalues and eigenvectors 


Putting A = 2 into the eigenvector equations (1), we obtain 


—@% + 4y =0 
xz — 4y =0. 


One equation is —1 times the other, so the equations are equivalent 
to the single equation 


x= Ay. 


Thus the eigenvectors corresponding to À = 2 are the non-zero 
vectors (x,y) for which x = 4y; that is, the vectors of the form 


(4k,k), where k £0. 


Since we are working in a real vector space, in this case Rĉ, when we are 
talking about eigenvectors, k represents a real number. 


Putting À = —3 into the eigenvector equations (1), we obtain 


4x + 4y = 0 
r+ y=O0. 


These equations are equivalent to the single equation 


y = T. 
Thus the eigenvectors corresponding to à = —3 are the non-zero 
vectors (x, y) for which y = —2; that is, the vectors of the form 


(k,-k), where k # 0. 
Thus the eigenvectors of t are the non-zero vectors of the following forms: 
(4k,k), corresponding to = 2, 
(k, —k), corresponding to \ = —3. 


This method produces all the eigenvalues and eigenvectors of the linear 
transformation. On the other hand, trying to show that these are the only 
ones by calculating the images of various points, as we started to do at the 
beginning of the section, would take forever! 


Exercise C118 


Let t : R? — R? be the linear transformation given by 
(a) Find the eigenvector equations of t. 


(b) Find the characteristic equation of t, and solve it to find the 
eigenvalues of t. 


(c) Solve the eigenvector equations, for each eigenvalue in turn, to find 
the eigenvectors of t. 


289 


Unit C4 Eigenvectors 


290 


1.2 Finding eigenvalues and eigenvectors 


You have just seen how to find the eigenvalues and eigenvectors of a given 
linear transformation t : R? —> R?. This method, as it stands, is rather 
tedious to use to find eigenvalues and eigenvectors of linear 
transformations from R? to RÌ, or R* to Rt, and so on. However, by 
introducing matrices, we can simplify the method. 


We now work through the same example as in the previous subsection, but 
this time we use matrices. 


Theorem C40 of Unit C3 tells us that there is a unique matrix for t with 
respect to the standard (ordered) basis in both the domain and codomain, 
and we use Strategy C15 from that unit to find this matrix. Recall that 
this strategy tells us essentially to ‘read off’ the matrix of a linear 
transformation when we are using the standard bases. We have 

t(1,0) = (1,1) and ¢(0,1) = (4, —2), so these vectors are the columns of the 
matrix of the linear transformation, since we are using the standard bases. 


Therefore, with respect to the standard basis for R?, the linear 
transformation t given by t(x, y) = (a + 4y, x — 2y) has the matrix 
representation 


t: v — Avy, where v = (7) and A= (} aE 
yY 1 -2 


If v is an eigenvector of t with corresponding eigenvalue A, then 
iV) = àv; 


in matrix form, this becomes 


KETEG 
€ +) (;) ~ (5) = (o) (2) 


Using the 2 x 2 identity matrix I, we can write 


(=a 


so equation (2) can be written as 


Lobo D) 


1 Eigenvalues and eigenvectors 


We simplify this matrix equation and obtain 


ED ta) @) = 0): 


This gives rise to the eigenvector equations 


tae Ay = 0 
x + (-2—A)y =0, 


as before, which we labelled equations (1). The characteristic equation is 


1-2 4 
b =o. | = 
that is, 
det(A — AT) = 0. 


We can therefore find the characteristic equation directly from the matrix 
of the linear transformation (with respect to the standard basis for both 
the domain and codomain) by subtracting À from each diagonal entry and 
then equating the determinant to zero. 


Once we have found the eigenvalues, we use the same method as before to 
find the eigenvectors; that is, we substitute each eigenvalue in turn into the 
eigenvector equations and solve them. 


In view of this connection with matrices, we adopt the following definitions. 


Definitions 


A non-zero vector v is an eigenvector of a square matrix A if 
Av = àv, for some à €R; 


A is the corresponding eigenvalue. 


The characteristic equation of a square matrix A is the equation 


det(A — AI) = 0. 


In this way we can refer to eigenvectors, eigenvalues and the characteristic 
equation of a matrix even when a linear transformation is not explicitly 
involved. 


The matrix A — AI is obtained by subtracting À from each entry on the 
diagonal of A. 


291 


Unit C4 _ Ejigenvectors 


Eigenvalues and eigenvectors of matrices occur naturally in many 
applications — for example, in the study of vibrating mechanical 
systems. In such examples, the characteristic equation may have 
solutions that are not real numbers, and these complex eigenvalues 
have significance in these applications. In this unit we are primarily 
interested in linear transformations of the plane and of 
three-dimensional space, so complex eigenvalues play no role here: we 
are concerned only with real eigenvalues and eigenvectors. 


Other areas of application include music, bridge design, oil 
exploration, image compression, and analysis of financial data. A 
particular example is the use of eigenvectors in the PageRank 
algorithm. This algorithm was invented by Larry Page and 

Sergey Brin, the founders of Google, in 1996 for use by the Google 
search engine to rank the importance of web pages. According to 
Google, PageRank works by counting the number and quality of links 
to a page to determine a rough estimate of how important the website 
is. The underlying assumption is that more important websites are 
likely to receive more links from other websites. The algorithm assigns 
a PageRank, or score, to each web page based on its linking web 
pages, with the links from different web pages being weighted 
according to particular criteria. The Google matrix represents the 
links between the web pages. A fundamental part of the algorithm is 
an iterative method that computes the dominant eigenvalue, that is, 
the eigenvalue of largest magnitude, and the corresponding 
eigenvector of the Google matrix to rank the web pages. 


Larry Page and Sergey Brin If a characteristic equation has no real solutions, then we say that there 
are no eigenvalues. For example, in Exercise C117(c), you considered the 
linear transformation representing an anticlockwise rotation through 7/2 
about the origin. The matrix of this linear transformation is 


TE: 


By the above definition, the characteristic equation of this linear 
transformation is 


O-A -l 
det(A — AI) = 1 = = 0. 
We expand the determinant and obtain 
MW +1=0. 


This equation has no real solutions: the linear transformation has no 
eigenvalues and hence no eigenvectors. This agrees with the geometric 
interpretation: no line through the origin is mapped to itself by this 
rotation. 


We summarise this matrix method for finding eigenvalues and eigenvectors 
in the following strategy. 


292 


1 Eigenvalues and eigenvectors 


Strategy C18 


To determine the eigenvalues and eigenvectors of a square matrix A, 
do the following. 


1. Find the eigenvalues: 
e write down the characteristic equation 
det(A — AI) = 0 
e expand this determinant to obtain a polynomial equation in A 
e solve this equation to find the eigenvalues. 
2. Find the eigenvectors: 
e write down the eigenvector equations 
(A — AI)v =0 
e for each eigenvalue A, solve this system of linear equations to 
find the corresponding eigenvectors. 


We illustrate Strategy C18 with the following worked exercise and exercise. 


Worked Exercise C63 


Let t : R? — R? be the linear transformation given by 


t(x, y) = (5x + 2y, 2x + 5y). 


Write down the matrix of t with respect to the standard basis for R?, and 
find the eigenvalues and eigenvectors of t. 


293 


Unit C4 Eigenvectors 


294 


which simplifies to 
V= MAAA 
The eigenvalues of A are therefore \ = 7 and À = 3. 


Next we find the eigenvectors of A. 


®. The eigenvector equations are (A — AI)v = 0; that is, 


o 


which we write as a system of linear equations. .& 


The eigenvector equations are 


(5—A)a + 27 — 0) 
2x +(5—A)y=0. 


A=7 | The eigenvector equations become 


=e s+ Zn =O 
2 = Pi) = 0. 


These equations are equivalent to the single equation 
V) = ii 


Thus the eigenvectors corresponding to A = 7 are the non-zero 
vectors for which y = x; that is, the vectors of the form 


(k, k), where k 40. 
A=3| The eigenvector equations become 
Ba se Zy = 0, 
ie =p By = O, 
These equations are equivalent to the single equation 
y= —T. 


Thus the eigenvectors corresponding to A = 3 are the non-zero 
vectors for which y = —2; that is, the vectors of the form 


(k,—k), where k 40. 


Thus the eigenvectors of the linear transformation t are the non-zero 
vectors of the following forms: 


(k,k), corresponding to \ = 7, 
(k, —k), corresponding to A = 3. 


1 Eigenvalues and eigenvectors 


Exercise C119 


For each of the following linear transformations t : R? —> R?, write down 
the matrix of t with respect to the standard basis for R?, and find the 
eigenvalues and eigenvectors of t. 


(a) t(x, y) = (x +3y,2x — 4y) (b) t(x,y) = (x — 2y, —2x — 2y) 


So far we have concentrated on linear transformations from R? to R? and 
on 2 x 2 matrices. We now use Strategy C18 to find the eigenvalues and 
eigenvectors of a linear transformation from R° to R? using a 3 x 3 matrix. 
Notice that here the characteristic equation is again a polynomial equation 
in A whose degree is the dimension of the domain of t — in this case 3. 


Worked Exercise C64 


Let t : R? — R? be the linear transformation given by 
t(x, y, z) = (2x + z,—x + 2y + 3z,x + 22). 


Write down the matrix of t with respect to the standard basis for R°, and 
find the eigenvalues and eigenvectors of t. 


Solution 


®. Since we are using the standard basis, we can again simply ‘read 
off’ the matrix: the columns are the images of (1,0,0), (0,1,0) and 
(0,0,1) under t. & 


The matrix of t with respect to the standard basis for R? is 


2 @ il 
A=[-1 2 3 
i @ 2 


We use Strategy C18 to find the eigenvalues and eigenvectors of A, 
which are the same as those of t. 


First we find the eigenvalues of A. 


®. Here we need the 3 x 3 identity matrix I = and so 


oor 
So — © 
=. oOo o 


subtract À from the three diagonal entries of A. © 
The characteristic equation is det(A — AI) = 0; that is, 
2— À 0 1 
—-1l1 2-x 3 |=0. 
1 0 2—x 


295 


Unit C4 Eigenvectors 


296 


We expand the determinant and obtain 


2—A 3 —-1 2-A 


@-»P 5 2-2 NO 


Simplifying this expression, we obtain 
(2— r)((2— A)? — 0) + © -—(2—A)) =0. 


@., When there is a common factor, it is best to keep this separate: 
the problem then reduces to factorising the remaining quadratic 
polynomial. © 


Taking out the common factor gives 


(2— A)((2—A)’-1) =0, 


which simplifies to 
=u =A, 

We can factorise this characteristic equation as 
(2—A)(A—3)(A—1) =0. 

The eigenvalues of A are therefore \ = 3, A = 2 and \ = 1. 


Next we find the eigenvectors of A. 


The eigenvector equations are 


(2 — A)z + Z—0 
—x + (2 — A)y + gz = 
ae + (2—A)z=0. 


A=3 | The eigenvector equations become 


=o a 220 
=f = Var 02 =U 
ae — z=0. 


®. It may sometimes be necessary to use the method of 
Gauss—Jordan elimination from Unit C1, but here the solutions 
can be found directly. © 


The first and third equations imply that 
C=. 

Substituting this into the second equation yields the equation 
De =y = 0. 


Thus the eigenvectors corresponding to A = 3 are the non-zero 
vectors (x,y,z) satisfying z = x and y = 2z; that is, the vectors 
of the form 


(k,2k,k), where k 40. 


1 Eigenvalues and eigenvectors 


A=2 | The eigenvector equations become 


B=) 
—x+3z=0 
T Z0: 


These equations have the solution 
2=0 an = 


However, there are no constraints on the unknown y. Thus the 
eigenvectors corresponding to À = 2 are the non-zero vectors 
(x,y,z) satisfying x = 0 and z = 0; that is, the vectors of the 
form 


(0,k,0), where k 40. 


A=1 | The eigenvector equations become 


T + z=0 
SH = War Be = 
gb + z=0. 


The first and third equations imply that 
Z=-2. 

Substituting this into the second equation yields the equation 
=A oy = 0. 


Thus the eigenvectors corresponding to A = 1 are the non-zero 
vectors (x,y, z) satisfying z = —x and y = 4z; that is, the 
vectors of the form 


(k,4k,—k), where k 40. 


Thus the eigenvectors of the linear transformation t are the non-zero 
vectors of the following forms: 

(k,2k,k), corresponding to A = 3, 

(0, k,0), corresponding to À = 2, 

(k, 4k, —k), corresponding to A = 1. 


Although cubic polynomials may not always be easy to factorise, you met 
some ways of factorising such polynomials in Subsection 1.4 of Unit A2 
Number systems. However, we will usually deal with examples that 
factorise easily. 


297 


Unit C4 _ Ejigenvectors 


298 


The following result, which we do not prove here, gives a useful check on 
the values found for the eigenvalues. You are asked to prove it yourself for 
2 x 2 matrices in the additional exercises booklet for this unit. 


Proposition C56 


The sum of the eigenvalues of a square matrix A is equal to the sum 
of the diagonal entries of A. 


For example, in Worked Exercise C64 the eigenvalues are 3, 2 and 1, which 
sum to 6, and the diagonal entries of the matrix A are 2, 2 and 2, which 
also sum to 6. 


The sum of the diagonal entries of a square matrix is sometimes referred to 
as the trace of the matrix. 


Exercise C120 
Let t : R? —> R? be the linear transformation given by 


t(x, y, z) = (4z + 2y, 2x + 3y + 2z, 2y + 22). 


Write down the matrix of t with respect to the standard basis for R3, and 
find the eigenvalues and eigenvectors of t. 


In most of the examples we have seen so far, the eigenvalues have not been 
easy to recognise directly and Strategy C18 has been required to find 
them. This is not always the case, as the following exercise illustrates. 


Exercise C121 


Find the eigenvalues of each of the following matrices. 


T 8 0 0 4 00 
(a) G J (b) [0 -5 0 (c) {25 -2 0 
0 0 21 17 r 6 


Finding eigenvalues of triangular and diagonal matrices is straightforward, 
as Exercise C121 illustrates. The eigenvalues are the diagonal entries of 
the matrix and no calculation is needed to find them. 


Theorem C57 


The eigenvalues of a triangular matrix and of a diagonal matrix are 
the diagonal entries of the matrix. 


1 Eigenvalues and eigenvectors 


Proof ®. A lower triangular matrix has every entry above the main 
diagonal zero. A diagonal matrix and the transpose of an upper triangular 
matrix are lower triangular matrices, so we can consider just lower 
triangular matrices here. & 


Let A = (a;;) be an n x n lower triangular matrix, so aj; = 0 for all j > i. 
The eigenvalues of A are the solutions to the characteristic equation 
det(A — AI) = 0. Now A — AI has diagonal entries aj; — A, and every entry 
above the main diagonal is zero. 


@. We expand the determinant along the top row and continue by 
expanding along the top row of the resulting determinants until the only 
determinants in the expression are of size 2 x 2. & 


The first term in the full expansion of the determinant is the only non-zero 
term in the expansion because of the placement of the zeros in the smaller 
determinants. This non-zero term is (a11 — A)(a22 — A) +++ (ann — A). 
Therefore the solutions to the characteristic equation det(A — AI) = 0 are 
a11, 422, ---; Ann, by the Factor Theorem (Theorem A2 in Unit A2), and 
the eigenvalues of A are precisely the diagonal entries of the matrix. 


A diagonal matrix is lower triangular and det AT = det A, so the 
eigenvalues of a triangular or diagonal matrix are the diagonal entries. E 


1.3 Eigenspaces 


In Subsection 1.1 we considered the linear transformation t : R? —> R? 
given by 


and saw that each of the lines y = —a and x = 4y is mapped to itself. 


The line y = —a, shown in Figure 3, consists of the points of the form 

(k, —k), each of which is an eigenvector of t corresponding to the 

eigenvalue À = —3, except when k = 0, which is specifically excluded. 

Similarly, the line x = 4y, also shown in Figure 3, consists of the points of Figure 3 The lines 

the form (4k, k), each of which is an eigenvector corresponding to the comprising the eigenvectors 
eigenvalue À = 2, except when k = 0. oft 


For each eigenvalue A, if we look at all the solutions to the equation 
t(v) = Av (including v = 0), then we obtain a line through the origin. The 
set of such solutions is a subspace of the domain of t. 


Theorem C58 


Let t: V — V hbe a linear transformation. For each eigenvalue A of t, 
let S(A) be the set of vectors satisfying t(v) = Av; that is, S(A) is the 
set of eigenvectors corresponding to À, together with the zero 

vector 0. Then S(X) is a subspace of V. 


299 


Unit C4 Eigenvectors 


300 


Proof Consider any eigenvalue A of a linear transformation t : V — V. 


®. We use Strategy C10 from Unit C2, Vector spaces and first check that 
0 € S(\). & 


For any linear transformation t, we have t(0) = 0 = 0, so 0 € S(A). 
@. Next we check that if v1, v2 € S(A), then vi + v2 E€ S(\). #& 
Let vi, v2 E€ S(A). Then 
t(vi + vo) = t(vi) + t(v2) = Avi + Ave = A(v1 + v2), 
since t is a linear transformation. 
Hence vj + v2 € S(A). 
®. Finally, we check that if v € S(A) and a € R, then av € S(\). # 
Let v € S(A) anda € R. Then 
t(av) =at(v) = adv = X(av), 
since t is a linear transformation. 
Hence av € S(A). 
Thus S(A) is a subspace of V. E 


Since S(A) is a subspace comprising eigenvectors (and 0), we call it an 
eigenspace. 


Definition 
Let t: V — V be a linear transformation and, for each eigenvalue A 


of t, let S(A) be the set of vectors satisfying t(v) = Av. Then S(A) is 
the eigenspace of t corresponding to the eigenvalue A. 


Worked Exercise C65 


Let t : R? —> R? be the linear transformation given by 
t(x, y, 2) = (4x + 2y, 2x + 3y + 2z, 2y + 22). 


Find the eigenspace S(0) of t, specify a basis for it and state its dimension. 


(You found the eigenvalues and eigenvectors of this linear transformation 
in Exercise C120.) 


1 Eigenvalues and eigenvectors 


Any vector in S(0) can be written as k(1,—2,2), so 


{(1, -2, 2)} 
is a basis for $(0). Thus S(0) has dimension 1. 


®. Geometrically, S(0) is a line through the origin in the direction of 
the vector (1, —2,2), so the only eigenvectors of t corresponding to 
à = 0 are on this line. & 


Exercise C122 


Let t : R? — R? be the linear transformation given by 
t(x,y, z) = (4r + 2y, 2a + 3y + 2z, 2y + 22). 


Find the eigenspaces $(6) and S(3) of t. In each case, specify a basis and 
state the dimension of the eigenspace. 


(In Exercise C120 you found that the eigenvectors of t are the non-zero 
vectors (2k, 2k,k) and (—2k,k, 2k), corresponding to the eigenvalues A = 6 
and à = 3, respectively.) 


Worked Exercise C66 


Let t : R? — R? be the linear transformation given by 


t(x, yz) = (0,4,2)- 


Find all the eigenspaces of t. In each case, specify a basis and state the 
dimension of the eigenspace. 


Solution 


The matrix of t with respect to the standard basis for R? is 


0 0 0 
AK=(|@ IL © 
® @ i 


This matrix is diagonal, so the eigenvalues are the diagonal entries: 
A=, Aslam A= I 


The eigenvector equations are 
—rr = 


(1—A)y = 
(1—A)z =0. 


301 


Unit C4 Eigenvectors 


302 


A=0 | The eigenvector equations become 


Ov = 0, 7 =O ancl zg = 0. 


Thus the eigenvectors corresponding to the eigenvalue À = 0 are 
the non-zero vectors (x,y,z) satisfying y = 0 and z = 0; that is, 
the vectors of the form 


(k,0,0), where k 4 0. 

The eigenspace S(0) is the set of vectors 
{(k,0,0):k € R}. 

Any vector in S(0) can be written as k(1, 0,0), so 
{(1, 0, 0)} 

is a basis for S(0). Thus $(0) has dimension 1. 

®. Geometrically, $(0) is the z-axis in R3. @& 


A=1 The eigenvector equations reduce to the single equation 


—x=0. 


Thus the eigenvectors corresponding to the eigenvalue À = 1 are 
the non-zero vectors (x,y,z) satisfying x = 0; that is, the 
vectors of the form 


(0,k,1), where k and l are not both 0. 

The eigenspace S(1) is the set of vectors 
{(0, k,l) : k,l € R}. 

Any vector in S(1) can be written as k(0,1,0) +/(0,0,1), so 
{(0, 1,0), (0,0, 1)} 

is a basis for S(1). Thus $(1) has dimension 2. 

®. Geometrically, $(1) is the plane x = 0 through the origin. ® 


In Worked Exercise C66 the (simplified) characteristic equation of the 
linear transformation t is 


AA=17 =0. 


The eigenvalue À = 1 is a ‘repeated’ solution of this characteristic 
equation; it is a multiple root and we say that A = 1 has multiplicity 2 
because the factor (A — 1) occurs twice. 


In general, we adopt the following definition. 


1 Eigenvalues and eigenvectors 


Definition 

If the characteristic equation of a square matrix A can be written as 
(A= A A= Aa) Oe =; 

where Aj, A2,...,Ap are distinct, then the eigenvalue A; of A has 


multiplicity m;, for 7 =1,2,...,p. 


For a triangular or diagonal matrix, the multiplicity of an eigenvalue is the 
number of times it appears on the main diagonal. 


Exercise C123 


Find the eigenvalues and eigenvectors of the matrix 


1 1 =i 
0 4 0 
00 4 


For each eigenvalue A, state its multiplicity, find the corresponding 
eigenspace SA), specify a basis for S(A) and state its dimension. 


From the examples that you have seen so far, you may be tempted to 
conjecture that the dimension of the eigenspace S(A), for a given 
eigenvalue A, is equal to the multiplicity of A. The following exercises give 
you the chance to investigate this conjecture. 


Exercise C124 


Find the eigenvalues and eigenvectors of the matrix 


(0 1): 


For each eigenvalue A, state its multiplicity, find the corresponding 
eigenspace S(A), specify a basis for S(A) and state its dimension. 


Exercise C125 


Find the eigenvalues and eigenvectors of the matrix 


1 -l 0 
1 4 1 
=, 1 4 


For each eigenvalue A, state its multiplicity, find the corresponding 
eigenspace S(A), specify a basis for S(A) and state its dimension. 


Hint: Look for factors in the characteristic equation and remember that 
z*—1=(r-1)(x +1). 


303 


Unit C4 Eigenvectors 


304 


In Exercise C124 the eigenvalue À = 1 has multiplicity 2, but it gives rise 
to an eigenspace of dimension only 1. In this case, the matrix represents a 
shear in the x-direction by a factor 1, as shown in Figure 4, and the only 
line through the origin left unchanged is the x-axis. Thus there is a single 
one-dimensional eigenspace, so the conjecture that the dimension of the 
eigenspace $(A) is equal to the multiplicity of A is false. 


y y 


Figure 4 A shear in the z-direction by a factor 1 


In Exercise C125 both eigenspaces have dimension 1 despite the 
eigenvalue 2 having multiplicity 2 and the eigenvalue 5 having 
multiplicity 1. In general, it can be shown that the dimension of an 
eigenspace cannot exceed the multiplicity of the corresponding eigenvalue, 
but we will not prove this. 


2 Diagonalising matrices 


In this section you will use the methods of finding eigenvalues and their 
corresponding eigenvectors that you met in the previous section to address 
the question posed in the introduction: 


Is it possible to find a basis for both the domain and codomain so 
that the matrix of a linear transformation is a diagonal matrix? 


It is therefore important that you are confident with the material in 
Section 1 before starting to study this section. 


2.1 Eigenvector bases 


In Section 1 we introduced the notions of an eigenvalue À and 
corresponding eigenvector v of a linear transformation t : R” — R”; 
that is, a non-zero vector v whose image t(v) is Av. For example, in 
Exercise C119(a) you saw that the linear transformation t : R? — R? 
given by 


has eigenvalues \ = —5 and A = 2 with corresponding eigenvectors the 
non-zero vectors of the forms (k, —2k) and (3k, k), respectively. We can 
choose any value of k (k 4 0) to specify specific eigenvectors; here, putting 
= 1 in both gives (1, —2) and (3,1). Since (3,1) is not a multiple of 
(1, —2), these two eigenvectors are linearly independent (this is the case 
whatever values of k are chosen). Therefore, by Theorem C25 in Unit C2, 
these linearly independent eigenvectors form a basis for R? — the domain 
and codomain of t. We say that {(1, —2), (3,1)} is an eigenvector basis of t. 


2  Diagonalising matrices 


Definition 
Let t : R” — R” be a linear transformation and let Æ be a basis 


for R” consisting of eigenvectors of t. The basis E is an eigenvector 
basis of t. 


Exercise C126 


Verify that {(—2, 1), (1,2)} is an eigenvector basis of the linear 
transformation t : R? — R? given by 


t(a,y) = (x — 2y, —2x — 2y). 


(In Exercise C119(b) you found that the eigenvectors of t are the non-zero 
vectors (—2k, k) and (k, 2k), corresponding to the eigenvalues \ = 2 and 
A = —3, respectively.) 


Exercise C127 


The set Æ = {(0,1,—1), (—2, 1,0), (1,0, -1)} is a basis for RÌ. Verify that 
E is an eigenvector basis of the linear transformation t : R? — R? given by 


t(x,y, z) = (—a + 2y + 22, 2x + 2y + 2z, —3x — 6y — 6z). 


In Unit C3 you met Strategy C15 for finding the matrix representation of a 
linear transformation t : V —> W with respect to given bases Æ and F for 
the domain and codomain of t. In this subsection you will see that this 
matrix representation is particularly simple if W = V, E is an eigenvector 
basis of t and F = E. 


Recall that if E = {e1,€2,..., en} is a basis for V, and v is a vector in V 
such that v = v1e1 +--+ + Unen, then the numbers v1,...,Un are the 
E-coordinates of v, and vg = (v1,...,Un)g is the E-coordinate 
representation of v. If E is the standard basis for V, then we usually omit 
the suffix E. 


We begin by rewriting Strategy C15 for the particular case when W = V 
and F = E (not necessarily an eigenvector basis). 


Strategy C19 (Strategy C15 with W = V and F = E) 


To find the matrix A of a linear transformation t : V — V with 
respect to the basis E = {e1, €2,.. . , en}, do the following. 


1. Find t(e1), t(e2),...,t(en). 
2. Find the £-coordinates of each of these image vectors. 


3. Construct the matrix A column by column using the £-coordinates 
of te, te form: column 4, tory = Iy 2yo ogi 


305 


Unit C4 Eigenvectors 


306 


In the next worked exercise we illustrate what happens when we find the 
matrix of a linear transformation t with respect to an eigenvector basis of t. 


Worked Exercise C67 


Consider the linear transformation t : R? — R? given by 


(a) Write down the matrix of t with respect to the standard basis for R?. 


(b) Find the matrix of t with respect to the eigenvector basis 


E = {(1,—2), (3, 1)}. 


Solution 


(a) The matrix of t with respect to the standard basis for R? is 


(2 i) 


(b) Following Strategy C19, first we find the images of the vectors in 
the basis E = {(1, —2), (3, 1)}: 


t(1,-2) =(—5,10) and (3,1) = (6,2). 


@®. We now write these image vectors in terms of their 
coordinates with respect to the eigenvector basis; that is, we 
express each of these vectors as a linear combination of the basis 
vectors E = { (1, —2), (3,1)}. The resulting calculations are 
remarkably straightforward! ® 


Next we find the E-coordinates of each of these image vectors: 
(—5, 10) = —5(1, —2) + 0(3, 1) 
= (=), 0) x, 
(6,2) = 0(1, —2) + 2(3, 1) 
= (0,2) 8. 


Therefore ¢(1, —2) = (—5,0)z and t(3, 1) = (0,2). So the matrix 
of t with respect to the eigenvector basis F is 


(o 2): 


In Worked Exercise C67(b) we found that the matrix of t with respect to 
the eigenvector basis is diagonal and that its diagonal entries are the 
eigenvalues of the linear transformation t. This is because the matrix of 
the linear transformation t maps the basis vectors to their images under t, 
but these basis vectors are precisely the eigenvectors that get mapped to 


multiples of themselves. You should find a similar outcome in the next 
exercise. 


Exercise C128 


Consider the linear transformation t : R? — R? given by 


t(x, y) = (x — 2y, —2x — 2y). 


(a) Write down the matrix of t with respect to the standard basis for R?. 


(b) Find the matrix of t with respect to the eigenvector basis 


E = {(-2, 1), (1, 2)}, 


which you found in Exercise C126. 


Worked Exercise C67(b) and Exercise C128(b) are special cases of the 
following result. We use the letter D in this result because the matrix is 
diagonal. 


Theorem C59 


Let t : R” — R” be a linear transformation, let E = {e1,e2,...,en} 
be an eigenvector basis ott and ker ej) = Ge, r j = 1 2... og M 
Then the matrix of t with respect to the eigenvector basis E is 


Dg O T) 
DA a a 
O O v Aa 


Proof Let t and E be as in the statement of the theorem. We use 
Strategy C19 to find the matrix of t with respect to the eigenvector 
basis F. 


®., Eigenvector ej corresponds to eigenvalue àj. & 
We have 
t(e;) = Ajey, forj =1, 2, n. 
We find the £-coordinates of each of these image vectors: 


t(e,) = 161 + Q0eog+--- Oe, = (å1,0,...,0)F, 
t(e2) = Oe; + AQe2 ---+0e, = (0, A2, fa ,O)z, 


t(e,) = Oe, + 0e2 +--+ Anen = (0,0,..., An) z- 


2 Diagonalising matrices 


307 


Unit C4 Eigenvectors 


So the matrix of t with respect to the eigenvector basis FE is 


Xio Ü ar Ü 
psj? 2 T 
D dea Ye 
as claimed. 


Using this result we can easily write down the matrix of a linear 
transformation with respect to an eigenvector basis. 


Exercise C129 


Consider the linear transformation t : R? — R? given by 
t(x, y, z) = (=x + 2y + 2z, 2x + 2y + 2z, —3x — 6y — 62), 
with eigenvector basis 
E = {(0,1, —1), (—2, 1,0), (1,0, —1)}. 


Use the solution to Exercise C127 to write down the matrix of t with 
respect to this eigenvector basis. 


2.2 Transition matrices 


Suppose that t : R” —> R” is a linear transformation and E is an 
eigenvector basis of t. We have just shown that the matrix of t with 
respect to the eigenvector basis Æ is a diagonal matrix D. 


Figures 5 and 6 show the linear transformation t with respect to the 
eigenvector basis Æ and the standard basis, respectively. 


E E 


t: vg — Dvg 


Figure 5 The linear transformation t with eigenvector basis E for the 
domain and codomain 


308 


t:vro Av 


Figure 6 The linear transformation t with standard basis V for the domain 
and codomain 


It is natural to ask whether there is any relationship between this matrix D 
and the matrix A of t with respect to the standard basis for R”. It turns 
out that there is an algebraic relationship between the matrices D and A. 


We now show this relationship. To do this, first we find an algebraic 
relationship between the E-coordinate representation of a vector vg (as in 
Figure 5) and the standard coordinate representation of the same vector 
(as in Figure 6). We begin by doing this for the example that we 
considered at the beginning of the section, where t : R? — R? is the linear 
transformation given by 


t(z,y) = (x + 3y, 2x — 4y) 

and E is the eigenvector basis {(1,—2), (3, 1)}. 

Suppose that the E-coordinate representation of a vector v in R? is 
Ve = (a,b) p. 

What are the standard coordinates of v? 


In column form, 


O 1 i$ 3) /a+3b\ _ 1 3 a 
TAND 1) \-2a+6/~ \-2 1) \o/, 
Thus in matrix form we have 


v= Pyp, 


where 


p=(2 3) 


Now, by the Summary Theorem (Theorem C19 in Unit C1), a square 
matrix is invertible if and only if its determinant is non-zero. Here we have 
det P = 1 — (—6) = 7 £40, so P is invertible with inverse P71. 


Since v = Pvp, it follows that 
Ply =P7!(Pvz) = (P'P)vg = ve. 


So multiplication on the left by the matrix P converts the E-coordinate 
representation of a vector into the standard coordinate representation and, 
similarly, multiplication on the left by the matrix P~! converts the 
standard coordinate representation of a vector into the E-coordinate 
representation. 


2  Diagonalising matrices 


309 


Unit C4 _ Ejigenvectors 


310 


In this case the columns of P are formed from the standard coordinates of 
the vectors in E, but this is no coincidence. This simple relationship 
between the matrix P and the basis E always holds and we call P the 
transition matric from the basis E to the standard basis for R?. 


The general definition is as follows. 


Definition 
Let E = {e1,€2,...,en} be a basis for R”. The transition matrix P 


from the basis FE to the standard basis for R” is the matrix whose jth 
column is formed from the standard coordinates of e;. 


Exercise C130 


(a) Write down the transition matrix P from the basis E = {(1,3), (2,5)} 
to the standard basis for R?. 


(b) Write down the transition matrix P from the basis E = {(0,1,-1), 
(—2, 1,0), (1,0, —1)} to the standard basis for R. 


In the example above, we have seen that the transition matrix P from the 
basis E = {(1, —2), (3, 1)} to the standard basis for R? converts 
£-coordinate representations into standard coordinate representations, and 
that P~! converts standard coordinate representations into E-coordinate 
representations. This is true in general. 


Theorem C60 


Let E = {e1,€2,...,@n} be a basis for R” and let P be the transition 
matrix from the basis E to the standard basis for R”. Then the 
standard coordinate representation of a vector in R” is given by 


v= Pve. 
Moreover, P is invertible and 


We = Poly. 


Proof ®. The matrix P converts the E-coordinate representation of a 
vector in R” to the standard coordinate representation of the same vector 
in R”, so in effect it is the matrix of the identity linear transformation 

U3 R” > R” with respect to the basis EF in the domain and the standard 
basis in the codomain. © 


The statement v = Pvp is equivalent to the statement that P is the 
matrix of the identity transformation ¿i of R” with respect to the basis E 
for the domain and the standard basis for the codomain. 


To find this matrix P, we use Strategy C15 from Unit C3. We begin by 
finding the images under 7 of the vectors in the domain basis Æ: 


i(e1) = €j, i(e2) =e), tas ilen) = Ën 


It now follows from Strategy C15 that each column of P is formed from 
the standard coordinates of the corresponding basis vector, so P is the 
transition matrix from the basis E to the standard basis for R”, as claimed. 


We know that the identity transformation 7 is invertible and that i~! = i. 
It follows from the Inverse Rule (Theorem C45 in Unit C3) that P is 
invertible and that P~! is the matrix of i : R” — R” with respect to the 
standard basis for the domain and the basis Æ for the codomain; that is, 


v— vg =P lV. | 


When F is the standard basis for R”, the matrix P is the identity 
matrix I,,, as you would expect. 


We also get the following corollary from Theorem C60. 


Corollary C61 


The rows or columns of an n x n matrix A form a set of n linearly 
independent vectors if and only if det A 4 0. 


Proof Let A be ann x n matrix. 
®. We start by proving the only if part. & 


We first show that if the columns of A are linearly independent, then 
det A Æ 0. 


Suppose the columns are linearly independent, then the columns form a 
basis for R” and A is the transition matrix from this basis to the standard 
basis. Hence A is invertible by Theorem C60, and so det A Æ 0 by the 
Summary Theorem (Theorem C19 in Unit C1). 


®. If the rows of A are linearly independent then we consider the 
transpose AT. .@ 


Suppose the rows of A are linearly independent, then the columns of AT 
are linearly independent and det AT Æ 0 by the above reasoning. We have 
det A = det AT by Theorem C14 in Unit C1, and hence det A Æ 0, as 
required. 


®. We now prove the if part using the contrapositive; that is, we show 
that if the rows or columns of A are not linearly independent then 
det A= 0. .& 


Suppose the rows of A form a linearly dependent set, then the row-reduced 
form of A contains a zero row, so A is not invertible by the Invertibility 
Theorem (Theorem C7 in Unit C1), and hence det A = 0 by the Summary 
Theorem. 


2  Diagonalising matrices 


311 


Unit C4 Eigenvectors 


Step 1 Step 3 


Step 2 


Figure 8 ‘The transition in 
three steps 


312 


Suppose the columns of A form a linearly dependent set, then the rows of 
AT are linearly dependent and det A = det AT = 0 by the above reasoning. 


Hence, if det A Æ 0, then the rows or columns of A form a linearly 
independent set of vectors. | 


Recall that our aim in this subsection is to relate the matrices D and A, 
where D is the matrix of a linear transformation t : R” — R” with 
respect to an eigenvector basis of t, and A is the matrix of t with respect 
to the standard basis for R”. Figure 7 shows how we can do this by using 
the transition matrix P from the eigenvector basis E to the standard basis 
for R”, so linking together Figures 5 and 6. 


E E 


V V 


Figure 7 The transition matrix P from the eigenvector basis E of t to the 
standard basis for R” 


The top line of the diagram shows that multiplication by D converts the 
E-coordinate representation of v to the E-coordinate representation 
of t(v): 

t(v)g = Dve. (3) 
The diagram also shows that this change can be achieved in another way, 
in three steps, highlighted in Figure 8. 

1. Use the transition matrix P to convert the E-coordinate 

representation of v to the standard coordinate representation of v: 


v = Pvg. 


2. Multiply v on the left by matrix A to obtain the standard coordinate 
representation of t(v): 


t(v) = Av = APvg. 


3. Use the matrix P~! to convert the standard coordinate representation 
of t(v) to the E-coordinate representation of t(v): 


t(v)g =P 't(v) =P 1APvez. 


Comparing this last equation with equation (3), we see that D, A and P 
are related by the equation 


D = PAP. 


Thus we have proved the following result. 


Theorem C62 


Let t : R” — R” be a linear transformation and let E be an 
eigenvector basis of t. Let A be the matrix of t with respect to the 
standard basis for R”, let D be the matrix of t with respect to the 
eigenvector basis Æ and let P be the transition matrix from E to the 
standard basis for R”. Then 


D = P'AP. 


In fact, Theorem C62 holds for any basis E for R”, although D is diagonal 
only when F is an eigenvector basis. 


Since D, A, P and P™! are all square n x n matrices, we can multiply 
D = P“!AP on the left by the matrix P and on the right by the matrix 
P~! to obtain the related equation 


A = PDP. 


This algebraic relationship A = PDP™t! may remind you of the algebraic 
relationship 


y=gonog! 


between conjugate permutations x and y in the symmetric group Sn, which 
you met in Subsection 4.1 of Unit B3. You saw in Unit C1 that the set of 
square invertible n x n matrices form a group under multiplication, and 
here the change of basis is in some sense equivalent to the ‘renaming’ in 
permutations. The matrices D and A are conjugate matrices: we will not 
use this concept here, but you will meet this idea of conjugacy in groups 
again in Book E. 


We end this subsection by applying Theorem C62 to some examples. 


Consider the linear transformation t : R? — R? given by 


2  Diagonalising matrices 


313 


Unit C4 _ Ejigenvectors 


In Worked Exercise C67 you saw that 
1 3 
a=( a) 
is the matrix of t with respect to the standard basis for R? and that 
—5 0 
p= (o 2) 
is the matrix of t with respect to the eigenvector basis Æ = { (1, —2), (3, 1)}. 


At the beginning of this subsection you saw that the transition matrix 
from the basis E to the standard basis for R? is 


E ve 


Now, using Strategy C4 from Unit C1, we have 


1 
pial 1 —3 _ [7 
7\2 1 2 


so 
1 3 
=j _ [7 7 1 3 1 3 
J ar- (i : € Ap \-2 1 
7 7 
_{-5 0 
E 0 2 
= D, 
as claimed. 


Exercise C131 


Use the solution to Exercise C128 to find a matrix P such that 
D = P“!AP, where 


1 -2 2 0 
ct >) and beh 2. 


2.3 Diagonalisation 


In this subsection you will consider the problem of determining when a 
matrix is diagonalisable and how to diagonalise a matrix when it is 
possible. 


Definition 


The matrix A is diagonalisable if there exists an invertible matrix P 
such that the matrix 


D=P "AP 


is diagonal. 


314 


Clearly the matrices A, D and P must all be square matrices of the same 
size. 


If a matrix A is diagonalisable, then to diagonalise it we need to find both 
the diagonal matrix D and the invertible matrix P, since it is this 
transition matrix P that links the matrix A with the diagonal matrix D. 


One particular use of diagonalisation of matrices is to find powers of 
matrices. We saw earlier that multiplying D = P~'AP on the left by P 
and on the right by P~! gives A = PDP™!. Now consider powers of A, 


A? = (PDP~')(PDP~*) 
= PD(P"!P)DP™! 
= PDDP "|! 
= PD?P™HE, 
and, in general we have 
A°=PD"P"!, forn=1,2,.... 


This last equation is useful for calculating powers of matrices, since 
calculating the nth power of a diagonal matrix is particularly simple: you 
need to find only the nth power of each diagonal entry. But first we need 
to be able to find both D and P (from which we can find P~'). 


Exercise C132 


(a) Write down DŽ, where D = (¢ 3) 


(b) Calculate A5, where A = € >) 


(In Exercise C131 you found that P = © ,) satisfies 
D=P-'!AP,) 


If A is any n x n matrix, then we can define a linear transformation t as: 
t: R” — R” 
v — Av. 


In Section 1 we said that v is an eigenvector of A with corresponding 
eigenvalue à if Av = t(v) = Av; that is, if v is an eigenvector of t. 


Definition 
Let A be an n x n matrix and let E = {e1,e2,...,en} be a basis 


for R” consisting of eigenvectors of A. The basis F is an eigenvector 
basis of A. 


Thus Æ is an eigenvector basis of A if E is an eigenvector basis of t. 


2  Diagonalising matrices 


315 


Unit C4 _ Ejigenvectors 


316 


Worked Exercise C68 


Find an eigenvector basis of the matrix 


A=(5 A 


Suppose that FE is an eigenvector basis of the n x n matrix A; that is, E is 
an eigenvector basis of the linear transformation t : R” — R” given by 


t(v) = Av. 


It follows from Theorems C59 and C62 that if P is the transition matrix 
from the basis E to the standard basis for R”, then 


D=P !AP 


is diagonal; that is, A is diagonalisable. This gives the following strategy 
for diagonalising a matrix, when this is possible. 


Strategy C20 

To diagonalise an n x n matrix A: 

1. find all the eigenvalues of A 

2. find (if possible) an eigenvector basis E = {e1,e2,...,en} of A 


3. write down the transition matrix P whose jth column is formed 
from the standard coordinates of e;. 


Then 
unease |) a se | 
T: 


where Aj is the eigenvalue corresponding to the eigenvector ej. 


The order of the eigenvalues down the diagonal of D must match the order 
of the eigenvectors in the basis E used to construct the transition 

matrix P. When asked to diagonalise a matrix, it is not enough to write 
down a diagonal matrix containing the eigenvalues: you must also give the 
transition matrix P. 


The complexity involved in finding an eigenvector basis of A in step 2 of 
Strategy C20 depends on the matrix A. In Worked Exercise C68 we 
formed an eigenvector basis of A by taking one eigenvector corresponding 
to each eigenvalue, ensuring that the eigenvectors were linearly 
independent. In general, we have the following result, which we will prove 
at the end of this subsection after looking at how it can be used. This 
result means that any eigenvector can be chosen for each (distinct) 
eigenvalue and there is no need to check that they are linearly independent. 


Theorem C63 


Let A be an n x n matrix with distinct eigenvalues A1, A2,...,An and 
corresponding eigenvectors €1, €2,..., €n: Then E = {e),e2,...,e,} is 
an eigenvector basis of A. 


We give an example of how Theorem C63 can be used. 


Worked Exercise C69 


Diagonalise the matrix 


2  Diagonalising matrices 


317 


Unit C4 Eigenvectors 


318 


It follows from Theorem C63 that we can form an eigenvector basis 
of A by taking one eigenvector corresponding to each of the three 
distinct eigenvalues. For example, 


B= (o 2, il) (0, 1, 0), (i 4, =1)} 
is an eigenvector basis of A. 


We use the eigenvectors in EF to form the columns of the transition 
matrix: 


i O il 
Pee (2 i a 
1 0 —1 
®@. Remember that the eigenvalues in D must appear in the same 
order as the corresponding eigenvectors in P. ® 


We use the eigenvalues corresponding to the eigenvectors in E to form 
the diagonal matrix: 


®. If the eigenvectors had been chosen in a different order, then the 
order of the columns of the transition matrix P and the order of the 
diagonal entries of the resulting matrix D would have been different. 


In addition, other transition matrices arise from using different 
eigenvectors for the eigenvector basis. 


Another solution is 


2O (Le eel 
P'AP. D orol where P=|2 —4 2 
GU g OT 


Both the order of the eigenvalues, and the eigenvectors chosen for the 
columns of P, differ here. © 


Exercise C133 


Diagonalise the matrix 


4 2 0 
A={2 3 2 
0 2 2 


(In Exercise C120 you found that the eigenvectors of A are the non-zero 
vectors (2k, 2k, k), (—2k,k,2k) and (k, —2k, 2k), corresponding to the 
eigenvalues \ = 6, A = 3 and A = 0, respectively.) 


It may be possible to find an eigenvector basis of an n x n matrix A even 
when A does not have n distinct eigenvalues. 


Strategy C21 

To find an eigenvector basis of an n x n matrix A: 

1. find a basis for each eigenspace of A 

2. form the set E of all the basis vectors found in step 1. 


If there are n vectors in F, then E is an eigenvector basis of A; 
otherwise E is not a basis. 


The fact that FE, as found in Strategy C21, is an eigenvector basis of A if 
and only if there are n vectors in FE, can be proved in a similar way to 
Theorem C63, but the details are more complicated. 


Worked Exercise C70 


Diagonalise the matrix 


4 2 2 


2  Diagonalising matrices 


319 


Unit C4 Eigenvectors 


Subtracting the second equation from the first, we obtain 

—6x + 6y = 0, which implies that x = y. Substituting this into 
the third equation, we obtain 4x — 4z = 0, which implies that 
p= hi 


Thus S(8) = {(k,k,k):k ER}. 
A=2 | All three eigenvector equations become 
2w F 2) =e 2 = (0), 
that is, c+y+z=0,soz=-—(x+y). 
Thus S(2) = {(k,l,—(k +1)) : k,l € R}. 


®. Any vector in $(8) can be written as k(1,1,1), and any vector in 
S(2) can be written as k(1,0,—1) + 1(0,1,—-1). @& 


A basis for S(8) is {(1,1,1)} and a basis for S(2) is 
{(1,0,—1), (0,1,—1)}. The set 

E= ue i 1) GE 0, =í); (0, 1, = 
contains three vectors, so it is an eigenvector basis of A. 


®. Note that Strategy C21 does not require us to prove linear 
independence of the vectors in E: combining the bases of the 

eigenspaces $(2) and S(8) gives a set of linearly independent 

vectors. .@ 


We use the eigenvectors in EF to form the columns of the transition 


matrix: 
il il 0 
P= ii 0 1 
1 —1 —1 


We use the eigenvalues corresponding to the eigenvectors in E to form 
the diagonal matrix: 


8 00 
P-'AP=D= 10 2 0 
00 2 


Exercise C134 


Diagonalise the matrix 


1 0 0 
A={0 2 1 
0 1 2 


320 


If the matrix A does not have an eigenvector basis, then these methods 
cannot be applied and the matrix A is not diagonalisable — there is no 
transition matrix. For example, in Exercise C124 you saw that all the 
eigenvectors of the matrix 


a=(1)) 


are non-zero vectors of the form (k,0). Any two eigenvectors of A are 
linearly dependent, so there is no eigenvector basis. Thus there is no 
transition matrix and A is not diagonalisable. 


Similarly the matrix 


1 =1 0 
B= 1 4 1 
=i 1 4 


from Exercise C125 is also not diagonalisable. The eigenvectors 
corresponding to the eigenvalue \ = 2 of multiplicity 2 are the non-zero 
vectors of the form (k, —k, k), so any two eigenvectors of B in (2) are 
linearly dependent. The other eigenvalue À = 5 has multiplicity 1. As 
stated at the end of Section 1, the dimension of an eigenspace cannot 
exceed the multiplicity of the corresponding eigenvalue, and so there 
cannot be two linearly independent eigenvectors corresponding to 
eigenvalue \ = 5. 


Therefore there is no set of three linearly independent eigenvectors and 
thus no eigenvector basis; there is no transition matrix and thus B is not 
diagonalisable. 


We have shown that, if the matrix A of a linear transformation t has an 
eigenvector basis, then using this basis for both the domain and codomain 
results in a matrix of t that is a diagonal matrix. On the other hand, if 
there is an eigenvalue of multiplicity m for which there are fewer than m 
linearly independent eigenvectors, then there is no eigenvector basis and 
matrix A is not diagonalisable. 


We end this section by proving Theorem C63 as promised. 


Theorem C63 

Let A be an n x n matrix with distinct eigenvalues A1, A2,...,An and 
corresponding eigenvectors e1,€2,...,@n. Then E = {e1,€2,..., en} is 
an eigenvector basis of A. 


Proof Let A and E be as in the statement of the theorem. 


®. Since any linearly independent set of n vectors in R” is a basis for R”, 
by Theorem C25 in Unit C2, we need show only that E is linearly 
independent. To do this, we assume that F is linearly dependent and 
obtain a contradiction. & 


If E is linearly independent, then E must be an eigenvector basis of A. 


2  Diagonalising matrices 


321 


Unit C4 Eigenvectors 


322 


Suppose to the contrary that EF is linearly dependent. Then we can take 
the smallest value of m (2 < m < n) for which a set of m vectors in E is 
linearly dependent. By relabelling the eigenvectors (if necessary), we can 
write 


aye + ageg +--+ + Amem = 0, (4) 
with ay £0, ag #0,..., am £ 0. 
Multiplying both sides of equation (4) by matrix A, we obtain 

A(aye; + azgeg +--+ + Amem) = AOD, 
that is, 


a;Ae; + agAe9 +: +A n,Aem = 0. 


Now, e€1,€2,---,@m are eigenvectors of A with corresponding eigenvalues 
Àl, dQ, Reece Anis sO Ae; = Ajej and 
ayA1e1 + ap A2e€2 os peta cise ie AnAmem = 0. (5) 


We now eliminate the vector em. To do this, we multiply equation (4) by 
Am and subtract the result from equation (5): 


ailà — Am)e1 + @2(Az — Am)e2 + +++ + Am—1(Am-1 — Am)em—1 = 0. 


Since the eigenvalues Az, A2,..., Am are distinct, and none of the numbers 
Q1,Q2,...,Q@m_1 is zero, we deduce that the set of m — 1 vectors 
{e1,€2,...,@m_—1} is linearly dependent. This, however, is impossible since 
we assumed that m is the smallest number such that a set of m vectors in 
E is linearly dependent. This contradiction establishes the result. E 


3 Symmetric matrices 


In this section you will concentrate on diagonalising symmetric matrices. 
You will see that such matrices are always diagonalisable and that their 
transition matrices can be chosen to have particular properties. 


3.1 Diagonalising symmetric matrices 


Suppose that A is an n x n matrix and that we can find a basis 

{e1,€2,... , €n} for R” consisting of eigenvectors of A. In Section 2 you saw 
that A can be diagonalised: if P is the transition matrix whose columns 
are formed from the coordinates of the eigenvectors e1, €2,...,e€n, then 


P'AP 
is a diagonal matrix. 


In this section you will see that whenever A is an n x n symmetric matrix 
(a matrix where AT = A), then we can always find a basis for R” made up 
of eigenvectors of A, and so such a matrix is always diagonalisable. In fact, 
we can always find an orthonormal basis for R” made up of eigenvectors 
of A. Recall from Subsection 5.4 of Unit C2 that an orthonormal basis 


consists of mutually perpendicular (orthogonal) vectors of magnitude 1. 
For example, the standard basis for R” is an orthonormal basis. 


When we have an orthonormal basis, it turns out that the inverse of the 
transition matrix P is actually the transpose of P; that is, P7! = P7. This 
can be useful since finding the transpose of a matrix is much simpler than 
finding the inverse. We will prove this result as Theorem C65 in the next 
subsection where you will also see that orthogonal matrices have other 
useful properties. 


For example, consider the symmetric matrix 


4 2 0 
A=[2 3 2 
0 2 2 


We will show that there is an orthonormal basis for R? that consists of 
eigenvectors of A. 


You found in Exercise C120 that the eigenvalues of A are A = 6, À = 3 and 
A = 0, and that the eigenvectors are the non-zero vectors of the following 
forms: 


(2k, 2k,k), corresponding to A = 6, 
(—2k,k,2k), corresponding to A = 3, 
(k, —2k,2k), corresponding to A = 0. 


Exercise C135 


Let vı = (2k, 2k, k), v2 = (—21,1, 21) and v3 = (m, —2m, 2m), where k,l, m 
are positive real numbers. 


(a) Show that {v1, v2, v3} is an orthogonal basis for RÌ. 


(b) Find values of k, l and m for which |v1| = |va| = |v3| = 1. 


In Subsection 5.4 of Unit C2 you saw that {v1, V2,..., Vn} is an 
orthonormal basis for R” if v;-v; = 0 for i Æ j, and |v;| = 1 for each i. It 
follows from Exercise C135 that 


_s(2 21 212) (1 22 
E = {(3, 3.3) »(—3,3>3) (35-33) 
is an orthonormal basis for R?. Since E is an eigenvector basis of A, we 
say that E is an orthonormal eigenvector basis of A. 


Following Strategy C20, we diagonalise the matrix A by writing down the 
transition matrix P whose columns are formed from the standard 
coordinates of the vectors in E: 


WIN WIN Wir 


P= 


WI Wily wire 
WI Wil wir 


3 Symmetric matrices 


323 


Unit C4 _ Ejigenvectors 


324 


A transition matrix formed from an orthonormal eigenvector basis in this 
way is called an orthogonal matrix. 


Definition 


An n x n matrix whose columns form an orthonormal basis for R” is 
an orthogonal matrix. 


It is important to remember that the columns of an orthogonal matrix are 
orthonormal vectors, not just orthogonal vectors, despite the name! 


Consider the 2 x 2 matrix 


1 1 
a-|y oY 


v2 v2 
The columns of A (as vectors) are orthogonal since 
(= — i (= -=) -0 
V2’ V2) \V2’ y3 
Orthogonal vectors are linearly independent, so the columns of A form a 
basis for R?. 


The columns of A (as vectors) also have magnitude 1 since 


1 \? 1y? iy i: 
=e a ee =r] Aea SF 
so the matrix A is an orthogonal matrix. 


Exercise C136 


Show that PTP = I, where 


WI WI Whe 


P= 


WIE wl cowl 
WIN WI wb 


(P is the orthogonal matrix formed below Exercise C135.) 


We know that if PTP =I, then PP? =I (by Theorem C18 in Unit C1), 
so for the matrix P in Exercise C136, P7 is the inverse of P; that is, 

PT = Pt. We will prove that PT = P~! for any orthogonal matrix P as 
Theorem C65 in the next subsection. 


It follows from this and Strategy C20 that 


6 0 0 
P7’AP=P '!AP=10 3 0 
00 0 


We say that the matrix A has been orthogonally diagonalised. 


Definition 


The matrix A is orthogonally diagonalisable if there exists an 
orthogonal matrix P such that the matrix 


D=P AP =P €'!AP 


is diagonal. 


The following strategy is a modification of Strategy C20 for diagonalising a 
matrix. 


Strategy C22 

To orthogonally diagonalise an n x n symmetric matrix A: 

1. find all the eigenvalues of A 

2. find an orthonormal eigenvector basis E = {e1, €2,..., €n} of A 


3. write down the orthogonal transition matrix P whose jth column 
is formed from the standard coordinates of ej. 


Then 
mM O e nO) 
e a 8), 
‘ee 


where A; is the eigenvalue corresponding to the eigenvector ej. 


In Section 4 you will see that orthogonal diagonalisation is used for 
classifying conics and quadrics. However, if the aim is simply to 
diagonalise a symmetric matrix as opposed to orthogonally diagonalise it, 
then use Strategy C20 — this saves time and effort when an orthonormal 
basis, or equivalently an orthogonal transition matrix, is not required. It is 
always a good idea to consider carefully what a problem requires you to do 
in order to solve it in the most efficient way. 


You may have noticed that the words ‘if possible’ appear in Strategy C20, 
but not in Strategy C22. This is due to the fact that an n x n symmetric 
matrix A always has an orthonormal eigenvector basis, so it must be 
orthogonally diagonalisable. It is also true that any orthogonally 
diagonalisable matrix A must be symmetric — you might like to prove this 
yourself; it is included as a ‘challenging’ exercise in the additional exercises 
booklet for this unit. 


In the case where a symmetric matrix A has n distinct eigenvalues, the 
fact that A has an orthonormal eigenvector basis follows from the 
following result. 


3 Symmetric matrices 


325 


Unit C4 _ Ejigenvectors 


T 


Figure 9 viw=v.w 


326 


Theorem C64 


Eigenvectors corresponding to distinct eigenvalues of a symmetric 
matrix are orthogonal. 


Proof Let A be a symmetric matrix, and let v and w be eigenvectors 
of A corresponding to the distinct eigenvalues À and u. Then 


Av= Av and Aw = uw. 


®. To show that v and w are orthogonal, we need to show that v- w = 0. 


We do this by writing vT Aw in two ways and using the fact that 
7 


v'w=v-w. This fact is illustrated in Figure 9. & 
We have, 
v" Aw = v” (Aw) = v” (uw) = u(v"w) = u(v w). 
Since A is symmetric, we have AT = A, and therefore that 
vľA = vT AT = (Av). 
It follows that 
vľ Aw = (v’ A)w = (Av)! w = (Av) w = A(vTw) = A(v - w). 
Therefore \(v- w) = u(v - w); thus 
(A= u)(v:w)=0. 


Since the eigenvalues À and p are distinct, A — u is non-zero, and hence 
v.w = 0. The two eigenvectors v and w are orthogonal as required. E 


The following exercises show how Theorem C64 can be used. 


Worked Exercise C71 


Orthogonally diagonalise the symmetric matrix 


a= (59). 


3 Symmetric matrices 


®@, Since any eigenvectors corresponding to these eigenvalues are 
orthogonal by Theorem C64, we form an orthonormal eigenvector 
basis of A by taking an eigenvector of magnitude 1 corresponding to 
each of the two distinct eigenvalues. ® 


An eigenvector of magnitude 1 corresponding to À = 7 is 
i il 
eva) 
An eigenvector of magnitude 1 corresponding to À = 3 is 
il 1 
(ava) 
It follows from Theorem C64 that an orthonormal eigenvector basis 
of A is 


aa) (aa) 


We use the eigenvectors in FE to form the columns of the orthogonal 
transition matrix: 


1 il 
v2 v2 
aon Iie 1 


V2 2 


We use the eigenvalues corresponding to the eigenvectors in E to form 
the diagonal matrix: 


Exercise C137 


Orthogonally diagonalise each of the following symmetric matrices. 


5 -1 -1 
(a) Oe E (b) A= E 


The eigenvalues in part (b) are À = 6, A = 3 and à = 2. 


327 


Unit C4 Eigenvectors 


328 


So far, in each case where we have orthogonally diagonalised an n x n 
symmetric matrix, we have had n distinct eigenvalues and Theorem C64 
has ensured that the eigenvectors are all orthogonal. We have then formed 
an orthonormal eigenvector basis for the matrix by writing down basis 
vectors of magnitude 1. Where the eigenvalues of the symmetric matrix 
are not all distinct we have to find an orthonormal eigenvector basis for 
each eigenspace — then Theorem C64 will ensure that the resulting set of 
eigenvectors will form an orthonormal eigenvector basis for the matrix. 


The following strategy is a modification of Strategy C21. It reflects the 
fact that we can always find an orthonormal basis comprising r vectors for 
an eigenspace of a symmetric matrix corresponding to an eigenvalue of 
multiplicity r. This result is not proved here. 


Strategy C23 

To find an orthonormal eigenvector basis of a symmetric matrix A: 
1. find an orthonormal basis for each eigenspace of A 

2. form the set E of all the basis vectors found in step 1. 


Then F is an orthonormal eigenvector basis of A. 


Worked Exercise C72 


Orthogonally diagonalise the symmetric matrix 


4 2 2 
A= {2 4 2 
22 4 


To find an orthogonal basis for the eigenspace S(2), we use the 
Gram-Schmidt orthogonalisation process. 
Let the orthogonal basis we seek be {v1, v2}, with vı = (1,0,—1). 
Then 
e (0,1,—1 
w= (Oi — v 


Wil 2 WAL 
Get =) (0, 1,51) 
1 1 ioe 
ee eee oo) =) 
= (0,1,-1) — 4 (1,0, -1) 


aa T 2): 


®. Dividing vz by |v2| = V6/2 gives a unit basis vector. However, 
although it is not necessary it is often helpful to minimise the minus 
signs involved: we can multiply through by —1 to get another unit 
basis vector orthogonal to vı. & 


An orthonormal basis for $(2) is therefore 


(aa) Gave) 


@. We have ensured that the eigenvectors in the basis for $(2) are 
orthogonal, and by Theorem C64 the eigenvectors corresponding to 
the distinct eigenvalues À = 8 and \ = 2 are orthogonal. © 


By Theorem C64 an orthonormal eigenvector basis of A is therefore 


{ede} ar ee) 


We use the eigenvectors in EF to form the columns of the transition 
matrix: 


1 1 1 

v3 y2 v6 
Pae 0 : 

EEE v6 
1 1 1 


VE “J VE 
We use the eigenvalues corresponding to the eigenvectors in E to form 
the diagonal matrix: 


P'AP D- 


S O Ce 
orvo 
SSS 


3 Symmetric matrices 


329 


Unit C4 _ Ejigenvectors 


330 


The diagonal matrix found here is the same as that found in Worked 
Exercise C70, since the eigenvalues are considered in the same order. The 
difference in the diagonalisation lies in the transition matrix, which in this 
case is orthogonal. 


Exercise C138 


Orthogonally diagonalise the symmetric matrix 


1 0 0 
A=[]0 2 1 
0 1 2 


(In Exercise C134 you found the eigenvalues and eigenvectors of A: that a 
basis for S(3) is {(0,1,1)} and a basis for S(1) is {(1,0,0), (0,1, —1)}.) 


We conclude this subsection by noting that every symmetric matrix can be 
orthogonally diagonalised and conversely that an orthogonally 
diagonalisable matrix is symmetric. However, it is possible to diagonalise 
(but not orthogonally diagonalise) a non-symmetric matrix that has an 
eigenvector basis. 


3.2 Orthogonal matrices 

In this subsection we look at some properties of orthogonal matrices. 
Remember that the columns of an orthogonal matrix form an orthonormal 
basis, not merely an orthogonal basis; that is, the columns are orthogonal 
vectors of magnitude 1. 


We have said that whenever P is an orthogonal matrix we have PT = P~!. 
We now prove this result. 


Theorem C65 


A square matrix P is orthogonal if and only if PT = P71. 


Proof We know by Theorem C18 in Unit C1 that PTP = I if and only if 
PP’ =I, so PT = P™! if and only if PTP =I. 


®. So we need to show that P is orthogonal if and only if PTP =I. We 
start off by considering the expression PTP. @ 


Let the columns of the matrix P be the column vectors x1, X2,... , Xn- 
Then the rows of the matrix PT are the row vectors x1, X2,...,Xn- 


For each i and j, the (i, j)-entry of PTP is the scalar product of the ith 
row of PT and the jth column of P; that is, x; - Xj. 


So PTP = I if and only if 
Xi- Xj = 0 whenever i Æj and x;-x;=1 for each i. 


This is the case precisely when {x1, X2,..., Xn} is an orthonormal basis 
for R”; that is, when P is orthogonal. | 


Several properties of orthogonal matrices follow from Theorem C65. 


Corollary C66 
Let P and Q be orthogonal n x n matrices. Then: 
(a) P~!(=P*) is orthogonal 


( 
( 
( 


b) the rows of P form an orthonormal basis for R” 
c) da P= 
) the product PQ is orthogonal. 


Proof (a) %®. To show that P~! is orthogonal we must show that the 


(a) 


transpose of P~! is the inverse of P~!. .@ 
By Theorem C65 we have PT = P~!. Now, 
(P71)? p-! = (Pp?) P71 = PP! = 1. 
Thus (P~!)? = (P~!)-!, so P~!(= PT) is orthogonal. 


The rows of P are the columns of PT. The matrix PT is orthogonal 
by part (a), so its columns form an orthonormal basis for R”. Thus 
the rows of P form an orthonormal basis for R”. 


We know that det PT = det P, and PT = P~! by Theorem C65, so 
PTP =1. 


@. By Theorem C14 in Unit Cl we know that 
det(AB) = (det A)(det B) and det A? = det A for square matrices A 
and B of the same size. ® 


Now, 
det(P7 P) = (det P?)(det P) = (det P)?, 
but. 
det(PTP) = det I = 1, 
so (det P)? = 1. Hence det P = +1. 


The proof of this is left for you to do in Exercise C139. | 


Exercise C139 


Let P and Q be orthogonal n x n matrices. Prove that the product PQ is 
orthogonal. 


(This is part (d) of Corollary C66.) 


3 Symmetric matrices 


331 


Unit C4 Eigenvectors 


RY 


Figure 10 ‘The angle 6 made 
by the vector (a,c) 


332 


To understand why orthogonal diagonalisation is useful — beyond the ease 
of finding the inverse of the transition matrix — we will now look at the 
geometry of orthogonal transition matrices in R? and R°. 


We begin by asking to what transformations of the plane the 2 x 2 
orthogonal matrices correspond. Suppose that 


(0) 


is an orthogonal matrix. Then the vectors (a,c) and (b, d) form an 
orthonormal basis for R? and det P = +1. 


We stated in Subsection 3.2 of Unit C3 that the magnitude of the 
determinant of a matrix of a linear transformation gives the ‘scaling 
factor’. Therefore det P = +1 means that there is no scaling; that is, 
magnitudes are preserved. 


Let 6 be the angle that the unit vector (a,c) makes with the z-axis, as 
illustrated in Figure 10 for the case that (a,c) is in the first quadrant, so 


(a,c) = (cos 6, sin 6). 


Since the unit vector (b, d) is orthogonal to (a,c), we have (a,c) + (b, d) = 0, 
so 


(b,d) = (—sin@,cos@) or (sin6,—cos6), 

as illustrated in Figure 11. 
YA 
(— sin 6, cos 0) 


(cos @, sin 0) 


0 


XY 


(sin 8, — cos 8) 
Figure 11 The two possible vectors (b, d) orthogonal to the vector (a,c) 
Hence, if det P = +1, then 
cos@ —sin0 
eo e = i 
and if det P = —1, then 
cos 0 sin 0 
P= ; 
(o 0 —cos 5) 
Now suppose that Æ = {e1, e2} is an orthonormal basis for R? and that P 


is the orthogonal transition matrix whose columns are formed from the 
coordinates of e; and eg. 


We have just seen that if det P = +1, then 
e; = (cosd,sin@) and e= (—sin6,cos6), 


that is, e; and e2 are the images of the standard basis vectors (1,0) and 
(0,1) under a rotation rg, as illustrated in Figure 12. 


y 
e2 = (— sin 0, cos0) ---4(0, 1) 


Figure 12 A rotation rg 


Similarly, if det P = —1, then e; and eg are the images of the standard 
basis vectors (1,0) and (0,1) under a reflection qg/z, as illustrated in 
Figure 13. 


e> = (sin 6, — cos 8) 


Figure 13 A reflection qg/2 


So if a 2 x 2 orthogonal matrix P is used to represent a linear 
transformation (as opposed to a transition matrix), then the linear 
transformation must be either a rotation or a reflection. 


Similar arguments can be applied to 3 x 3 orthogonal matrices to show 
that linear transformations of R? whose matrices are orthogonal are 
rotations about a line through the origin, reflections in a plane through the 
origin or combinations of these. The orthogonal matrices representing 
rotations of R? are precisely those with determinant +1. 


3 Symmetric matrices 


333 


Unit C4 _ Ejigenvectors 


334 


Exercise C140 


Consider the matrix 


0 0 -1 
A={0 1 0 
1 0 0 


(a) Verify that this matrix is orthogonal. 


(b) Show that this matrix represents a rotation of R3. 


Let t be a linear transformation from R” to R” with a matrix 
representation that is a symmetric matrix A. In effect, when we 
orthogonally diagonalise A, we are finding a basis for R” for which 


e the matrix of t is diagonal 
e the basis vectors are orthogonal 
e the basis vectors have magnitude 1. 


For R? and R this new basis is simply the standard basis rotated, 
reflected or, for R?, a combination of the two. 


4 Conics and quadrics 


In this section you will classify conics and quadrics using many of the 
techniques you have learned in this book on linear algebra, including 
orthogonal diagonalisation of symmetric matrices. 


You revised conics in Unit A4 Real functions, graphs and conics. 


4.1 Classifying conics 


A non-degenerate conic may be a circle, an ellipse, a parabola or a 
hyperbola. It is said to be in standard position if it is positioned in the 
plane as follows. 


e For a circle: its centre is at the origin. 


e For an ellipse: its axes of symmetry are the x- and y-axes, and its 
largest width is along the z-axis. 


e For a parabola: its axis of symmetry is the z-axis, it passes through the 
origin and its other points lie to the right of the origin. 


e For a hyperbola: its axes of symmetry are the x- and y-axes, and it 
crosses the x-axis. 


A circle may sometimes be considered to be a special type of ellipse, and 
that will be the case throughout this section. 


An ellipse, a parabola and a hyperbola in standard position are illustrated 
in Figure 14. 


YA YA YA 


SY 
8 
SY 


(a) (b) (c) 


Figure 14 Conics in standard position: (a) ellipse (b) parabola and 
(c) hyperbola 


The line joining the vertices of an ellipse is the major axis of the ellipse, 
and the line perpendicular to this through the centre of the ellipse is the 
minor axis of the ellipse. Thus, for an ellipse in standard position, the 
major and minor axes are the x-axis and y-axis, respectively. 


We can define major and minor axes for parabolas and hyperbolas 
similarly. 
e For a parabola, the major axis is the axis of the parabola, and the minor 


axis is the line perpendicular to this through the vertex of the parabola. 


e For a hyperbola, the major axis is the line joining the vertices of the 
hyperbola, and the minor axis is the line perpendicular to this through 
the centre of the hyperbola. 

Notice, in each case the minor axis is parallel to the directrix of the conic. 

(You met the directrix of a conic in Section 5 of Unit A4). 


In this way, the major and minor axes of any conic in standard position 
are the x-axis and y-axis, respectively. 


An ellipse in standard position has equation 


a parabola in standard position has equation 
y? = 4ax 


and a hyperbola in standard position has equation 


2 2 
Z-% =l 
a b2 


4 Conics and quadrics 


335 


Unit C4 Eigenvectors 


BUVUUVUUVUUVUUUSVUB VLUU) 
| 


Figure 15 Moving the axes to 
be able to recognise a conic 


336 


Theorem A21 in Unit A4 says that any conic in R? is the set of 
points (x,y) in R? that satisfy an equation of the following form 

Az? + Bry + Cy? + Fr + Gy+ H =0, (6) 
where A, B, C, F, G and H are real numbers, and A, B and C are not all 
zero. This theorem also says the converse: that the set of all points in R? 
whose coordinates (x,y) satisfy an equation of this form is a conic. 
However, such a conic may be degenerate — in this subsection we will only 
be concerned with non-degenerate conics. 


Given the equation of a non-degenerate conic, such as 

a” — dry — 2y? + 6a + 12y + 21 = 0, (7) 
we would like to be able to decide whether it represents an ellipse, a 
hyperbola or a parabola. We know it is not a circle because of the non-zero 
term in xy, but it is too complicated to easily determine more than this. 
Generally, the equations of conics that arise in calculations are not in 
standard position: thus we need some way of determining the nature of a 
conic from its equation. 


In fact, equation (7) represents a hyperbola with centre (1,2), major axis 
y = 2x and minor axis x + 2y = 5. This conic would be easily recognisable 
were we to move the axes of the plane so that they pass through the centre 
and line up with the major and minor axes of the conic, as illustrated in 
Figure 15. 


You will see that we can move the axes of the plane by introducing 
matrices and changing the basis for the plane, then performing a 
translation so that the conic is in standard position with respect to these 
new basis vectors. The conic will then be easily recognisable from its 
equation. 


We will actually be a little less specific with how we move the axes 
mathematically and may not always end up with a conic in standard 
position: the axes may be interchanged or pointing in the opposite 
directions resulting in conics that are reflected or rotated. However, in 
every case the axes will align with the major and minor axes of the conic, 
and the equation will resemble the equation of a conic in standard 
position; we say that such an equation of a conic is in standard form. 


An ellipse and a hyperbola with equations in standard form, but that are 
not in standard position, are illustrated in Figure 16. 


2 2 2 


EY YA Z aY 


a2? b a2 b2 


(a) (b) 


Figure 16 Conics not in standard position with equations in standard form: 
(a) ellipse and (b) hyperbola 


8 
SY 


Parabolas with equations in standard form, but not in standard position, 
are illustrated in Figure 17. 


YA x? = day 
a>O 
y? = 4ax 
aU 
T 
2 = day 
a<0 


Figure 17 Parabolas not in standard position with equations in standard 
form 
Introducing matrices 


We first write equation (6) Ax? + Bry + Cy? + Fz + Gy + H = 0 using 
matrices and vectors; that is, in matrix form as 


xTAx+J”x+ H =0, (8) 


where 


_(A 4B 
a=(i5 ae i 


This is possible, since 


T O A 3B ey Ax + ¿By 
xX Arm (i v) (ip olua t Br + Cy 


= Ar? + Bry + Cy? 


|l 
N 
Qs 
<< 
x 

I 
25N 
e 8 
L 


and 


I’x=(F G) C) 


= Fz + Gy. 


4 Conics and quadrics 


337 


Unit C4 Eigenvectors 


338 


Notice that the matrix A is symmetric; this will be important. 


For example, the conic with equation (7) can be written in matrix form (8) 
with 


1 —-2 6 £ 
TEES e Se 


Exercise C141 


For each of the following equations of a conic in standard position, write 


the equation in matrix form and specify the matrices A and J. 
2 2 2 2 


(a) the ellipse Z + 5 = (b) the hyperbola “ee 1 


(c) the parabola y? = 4ax 


Aligning the axes 


The matrix A in the matrix representation (8) is symmetric, so we know 
that we can orthogonally diagonalise this matrix to get P’ AP = D 
where P is an orthogonal transition matrix. 

This helps us recognise the conic by aligning the basis vectors with the 
axes of the conic and therefore removing the xy-terms from the equation. 
The columns of P form an orthonormal basis Æ, and P changes 
E-coordinates xg, which we will write in the form x’ = (2’, y’), into 
standard coordinates x = (x,y), so that x = Px’. 


In this way equation (8) becomes 
(Px’)? A(Px’) + JTPx' + H =0, 


which can be rewritten as 
(x’)? (PTAP)x’ + J7Px’ +H =0. (9) 


Now, P? AP = D is a diagonal matrix with diagonal entries Ay and A9, so 
we have 


(x’)? (PT AP)x’ = (x’)? Dx’ 


=e DG a) C) 
= M (2)? + à2(y')?, 


and therefore there is no x'y'-term in the new equation (9) for the conic. 
Written in the form of equation (9), this now more closely resembles the 
equation of a conic in standard position. The vectors in the orthonormal 
basis E of the plane are aligned with the axes of the conic: we say we have 
aligned the axes. 


The order and direction in which the eigenvectors are chosen affects the 
orthonormal basis Æ and therefore the transition matrix P obtained. 


However, in every case P is an orthogonal matrix and so det P = +1. 
Orthogonal diagonalisation ensures that the new basis vectors are 
orthogonal (perpendicular) and of magnitude 1. If P is considered to 
represent a linear transformation (as opposed to a transition matrix), then 
the linear transformation is either a rotation (det P = +1) or a reflection 
(det P = -1). 

It is sometimes preferable, when choosing the orthonormal basis F, for it 
to be a rotation (rather than a reflection) of the standard basis vectors; 
that is, that P, considered as a linear transformation, is a rotation. This is 
achieved by ensuring that det P = +1 (using either geometric insight, or 
by checking the determinant). However, this step is not required in this 
module. 


We now illustrate the process of rewriting a conic in the form of 
equation (9) by applying the process to equation (7), where 


A=() 7). 


Worked Exercise C73 


Express the non-degenerate conic 


xr? — dey — 2y? + 6x + 12y + 21=0 


in the form of equation (9). 


Solution 


®. The matrix form of the equation of the conic is 
x Ax J a =O where 


1 —2 6 E 
n a 


In Exercise C119(b) you found that the eigenvectors of A are the 
non-zero vectors (k, 2k) and (—2k, k), corresponding to the 
eigenvalues A = —3 and A = 2, respectively. 


We start by orthogonally diagonalising A. © 
We use Strategy C22 to orthogonally diagonalise A. 


An orthonormal basis for S(—3) is 


tæ) 


and an orthonormal basis for $(2) is 


(vay) 


4 Conics and quadrics 


339 


Unit C4 Ejigenvectors 


By Theorem C64 an orthonormal eigenvector basis of A is therefore 


To 


We use the eigenvectors in EF to form the columns of the transition 
matrix: 
il 2 


A 
2 1 
V5 V5 
®. Note that det P = +1, so the basis vectors in FÆ are the images of 


the standard basis vectors under a rotation, but that does not concern 
us here. © 


We use the eigenvalues to form the diagonal matrix 


T om H 
a 7) 3 


®. We substitute into (x’)'(P7AP)x’ + J7Px’+H=0. # 
It follows from equation (9) that the equation of the conic is now 
1 2 


(a y) T À G +(6 12) e ve (o 491 =0, 
V5 V5 
that is, 
3(a’)? + 2(y’)? + 6V52' + 21 = 0. 


®. There are no terms in z’y’ in this new equation. ©& 


You might wonder what the equation in Worked Exercise C73 would have 
been if the eigenvalues had been chosen in the opposite order? The next 
exercise investigates this. 


Exercise C142 


Express the non-degenerate conic 


a? — Ary — 2y? + 6x + 12y + 21 = 0 


in the form of equation (9), using the eigenvalues in the order \ = 2 then 
A = —3. 


340 


The equation of the conic with the eigenvalues \ = —3 then A = 2 and the 
equation of the conic with the eigenvalues A = 2 then \ = —3 are very 
similar. It looks like the roles of x’ and y’ have been interchanged; that is, 
the order of the coordinates have been interchanged, which corresponds to 
interchanging the axes. We have det P = —1 in Exercise C142 so this 
transition matrix corresponds to a reflection of the axes, whereas we have 
det P = 1 in Worked Exercise C73 so this transition matrix corresponds to 
a rotation. 


In general, for any conic, if 


o-(8 3) 
then equation (9) is of the form 

Ml)? + Aa(y’)? + fa’ + gy! +H = 0, (10) 
where (f g) =J P. 


The equation of the conic in this form has been simplified since it now has 
no x'y' terms, but is not yet in a form from which we can easily recognise 
the type of the conic: a translation of the axes is also required. 


Translating the origin 


To write the equation of the conic in standard form from which we can 
easily recognise the type of the conic, we need to eliminate any superfluous 
linear x’ and y’ terms. This is achieved by translating the origin using an 
(a, 3)-translation and moving to new coordinates x” = (x,y): we say we 
have translated the origin. 


To do this, we first complete the squares in the equation of the conic. We 
illustrate this process using the conic with equation (7). We have already 
aligned the axes to obtain the equation 


—3(x')? + 2(y’)? + 6V5a’ + 21 = 0, 
which is equivalent to 
= (eF -— 2/5") + 2(y')? +21 =0. 


This equation has no linear y’ term, so we only need to complete the 
square involving x’. We obtain 


3(a’ — V5)? +15 + 2(y')? + 21 = 0, 


sO 
—=3(x! — V5)? + 2(y')? + 36 = 0. 


In Subsection 1.3 of Unit A4 you saw that applying an (a, 3)-translation to 
the graph of y = f(x) gives the graph of y = f(a — a) + 8, or equivalently, 
y — B = f(x — a). We can express this translated curve more simply by 
using new (z’, y’)-coordinates obtained by an (a, 3)-translation of the 

(x, y)-axes: we do this by setting x’ = x — a and y/ = y — B. In this new 
(x’, y’)-coordinate system the equation of the translated curve is y' = f(z’). 


4 Conics and quadrics 


341 


Unit C4 Eigenvectors 


342 


For our conic we use a (v5, 0)-translation, so we set the new coordinates 
to be 


x” = (x,y) = (a! _ V5, y’). 


Thus we rewrite the equation of the conic using these coordinates by 
substituting 


a" =x'— V5 and y" =y', 

which results in the following simplified equation of the conic 
—3(2")? + 2(y")? = —36, 

or 
E a 


12 18 


This equation is now recognisable as the equation of a hyperbola in 
standard form. In fact, it is also a hyperbola in standard position with 
respect to these new axes, since the (x”)? term is positive and the (y”) 
term is negative. 


= 


2 


For this conic we have 
e introduced matrices A and J 


e orthogonally diagonalised the matrix A to find the orthogonal transition 
matrix P which rotates the (x, y)-axes by 0 = cos~!(1//5) to get the 
(x’, y’)-axes 

e translated by 5 in the 2’ direction to get the (a, y’’)-axes. 

This is illustrated in Figure 18. 


align the axes translate the origin 


YA 


Figure 18 Moving the axes to get the equation of the conic in standard form 
(A = —3 then A = 2) 


What would the equation of this conic have been if the eigenvalues had 
been chosen in the opposite order? The next exercise investigates this 
using the equation you found in Exercise C142. 


Exercise C143 


Write the equation of the conic 


x? — dry — 2y? + 6x + 12y + 21=0 


in standard form by completing the square in the equation 
I(x’)? — 3(y/)? + 6V5y' + 21 =0 


and then making a substitution to get coordinates (a”, y”). 


Figure 19 illustrates how the axes have been moved with the eigenvalues in 
the order À = 2 then A = —3, as in Exercise C143: the axes are reflected 
and then translated. 


align the axes translate the origin 


Figure 19 Moving the axes to get the equation of the conic in standard form 
(A = 2 then à = —3) 


The equations in standard form found for the conic with equation (7) are 


Ny 2 N\ 2 
YM 24, hao hen ha 
2 18 


and 
(ge (y)? 
18 12 


In the second case the hyperbola is not in standard position with respect 
to these new axes, since the (x”)? term is negative and the (y”)? term is 
positive. 


=1, for \=2 then à = —3. 


It is clear that the roles of z” and y” have been interchanged. 
Geometrically, the new axes of the plane have been interchanged, so the 
hyperbola has related, but different, equations in relation to these different 
choices of axes. However, both equations are in the standard form for a 
hyperbola, so the choice of the order of the eigenvalues does not affect the 
conclusion that this conic is a hyperbola. 


4 Conics and quadrics 


343 


Unit C4 Eigenvectors 


344 


Ellipse and hyperbola 


In general, if neither eigenvalue is 0, then completing the squares in 
equation (10) gives an equation of the form 


wae au i. Oy g \? 
w (24955) -™ (au) +2 (¥+a55) ~* (a) +28 


which can be written as 
A(z")? $ Aaly")? = K, 


where 


1" 1 g 
— = — d K=—+4-——-dH 
M es. Im” 4X 
Writing the equation in standard form gives 
(x")? (y)? 
Kia) Khe 


which is the equation of an ellipse if both K/A; and K’/.2 are positive, and 
a hyperbola if one is negative and the other positive. (No other possibility 
can occur, although we do not explicitly show this.) 


Parabola 


In general, if one eigenvalue is 0, say A; is 0 and Ap Æ 0, then equation (10) 
has the form 


aly’)? + fa’ + gy’ +H =0. 


Completing the square in this equation gives 


2 2 
fel +ro(y' +--+) -w() +4 =0, 
2 2r2 


which can be written as 
daly")? + fa" = 0, 


where 


2 
"oy g no 1_ AQ g H 
u a ae eee 


Writing the equation in standard form gives 


which is the equation of a parabola. 


If Ay Æ 0 and A2 is 0, then we obtain the similar equation 


which is also the equation of a parabola. 


Summarising the method 


There are several steps involved in writing the equation of a conic in 
standard form, so we summarise this method in the following strategy. 


Strategy C24 


To write the non-degenerate conic with equation 
Az? + Bry + Cy?+ Fx+Gy+H =0 


in standard form, do the following. 
1. Introduce matrices: 


1 
e write down A = e a and) = fal 


2. Align the axes: 
e orthogonally diagonalise A to get 


T Ià 0 
PTaP= (4 i 


e find ( if g) = JTP, and write the conic in the form 
A e e sey a = 0. 
3. Translate the origin: 


e complete the squares 
e make a substitution to change to the coordinate system (x,y). 


The order in which the eigenvalues are chosen does not affect the form of 
the equation obtained: it will be the standard form for an ellipse, a 
hyperbola or a parabola. 


The following worked exercise and exercises illustrate this strategy. 


Worked Exercise C74 


Use Strategy C24 to write the non-degenerate conic with equation 


5a + dey + 5y? + 202 + 8y -1=0 


in standard form. Is this conic an ellipse, a parabola or a hyperbola? 


4 Conics and quadrics 


345 


Unit C4 Eigenvectors 


2. Align the axes. 
®. We orthogonally diagonalised A in Worked Exercise C71. © 


We have 
P= (| 7 
where 
ee 
2 2 
a fe e 
P 9 
sO zit aie 
(fF g)=(20 8) a 
v2 v2 
(2 2) 
JVZ A 
= (1272916172) 


The equation of the conic is now 
T(x)? + 3(y')? + 14V22' + 6V2y’ — 1 = 0. 
3. Translate the origin. 


®. To keep track of the terms when completing the square, we first 
collect the x’ terms and the y’ terms. We take out the coefficients 
of (x')? and (y’)? as factors. © 


We write this equation as 
7 ((x'?? + 2V2) +3 (W)? + 2v2y') -1 =0. 
Completing the squares in this equation, we obtain 
We DV SEE V I=IN, 


We substitute z” = a! + V2 and y" = y' + V2 into this equation 
and simplify to obtain 


MY A = = 0) 
The equation of the conic in standard form is 
N\2 N2 
w o 
3 m 
The conic is an ellipse. 


= ll. 


®@. We can see that this ellipse is not in standard position with 
respect to these new axes since 3 < 7. ©& 


346 


Exercise C144 


Use Strategy C24 to write the non-degenerate conic with equation 


9x? — dry + 6y? — 10x — 20y —5 = 0 


in standard form. Is the conic an ellipse, a parabola or a hyperbola? 


(In Exercise C137(a) you found that 


e(a) h) 


is an orthonormal eigenvector basis for the matrix A of this conic with 
respect to the eigenvalues \ = 10 and A = 5.) 


Exercise C145 


Use Strategy C24 to write the non-degenerate conic with equation 


x? — Any + 4y? — 62 — 8y +5=0 


in standard form. Is the conic an ellipse, a parabola or a hyperbola? 


4.2 Classifying quadrics 


Quadrics, or quadric surfaces, are surfaces in RÌ. They are the 
three-dimensional analogues of conics. 


Definition 
A quadric in R? is the set of points (x,y,z) that satisfy an equation 
of the form 


Ag? + By? + C2? 4+ Fay + Gyz+ Hez+ Je+ Ky+Lz2+M =0, 


where A to M are real numbers, and A, B, C, F, G and H are not 
all 0. 


In general the situation is more complicated than for conics and the general 
situation is beyond the scope of this module. However, it can be shown 
that there are nine types of quadrics involving curved surfaces in R3. Each 
of these types can be positioned in space to be in standard position; 
that is, with its axes aligned with the z-, y- and z-axes in a similar manner 
to the non-degenerate conics. These quadrics in standard position have 
easily recognisable equations and the different types can be distinguished 
by the curves of intersection of the planes parallel to the coordinate 
planes that meet the quadric in a non-trivial intersection. Figure 20 shows 
some curves of intersection for a sphere — they are all circles. 


4 Conics and quadrics 


Figure 20 Some curves of 
intersection of a sphere 


347 


Unit C4 Eigenvectors 


348 


The curves of intersection of a non-degenerate quadric are 
non-degenerate conics. There are five types of non-degenerate quadric: 


e the ellipsoid (which includes the sphere) 
e the elliptic paraboloid 

e the hyperbolic paraboloid 

e the hyperboloid of one sheet 

e the hyperboloid of two sheets. 


Table 1 illustrates each of these quadrics and gives the equation in 
standard position, as well as specifying the curves of intersection. 


There are four types of degenerate quadric involving curved surfaces: 
e the elliptic cone 

e the elliptic cylinder 

e the parabolic cylinder 

e the hyperbolic cylinder. 


The curves of intersection of these include non-degenerate conics, 
degenerate conics and pairs of parallel lines. The elliptic cone in standard 
position is illustrated in Table 1, where the equation is given and the 
curves of intersection specified. The elliptic cone can be considered as 
intermediate between the hyperboloids of one and two sheets — where the 
two sheets touch at a point. The three types of cylinder in standard 
position, illustrated in Figure 21, are surfaces whose equations do not 
involve z explicitly. 


Figure 21 Degenerate quadrics: (a) elliptic cylinder (b) parabolic cylinder 
and (c) hyperbolic cylinder 


The only degenerate quadrics we will consider for the remainder of the 
linear algebra topic are elliptic cones, thus giving the following list of six 
quadrics, all included in Table 1: the ellipsoid (including the sphere), the 
elliptic paraboloid, the hyperbolic paraboloid, the hyperboloid of one 
sheet, the hyperboloid of two sheets and the elliptic cone. 


Table 1 Quadrics: equation in standard position and the curves of intersection 


Ellipsoid 
2 2 2 
Se 
V e 


curves of intersection: 
ellipse 


Elliptic paraboloid 


a y? 


a b 
curves of intersection: 
ellipse or parabola 


— 


Hyperbolic paraboloid 


r2 y? 


a P 
curves of intersection: 
hyperbola or parabola 


Hyperboloid of one 
sheet 


curves of intersection: 
ellipse or hyperbola 


Hyperboloid of two 
sheets 


L y 2 7 
atp eo 
curves of intersection: 
ellipse or hyperbola 


Elliptic cone 


curves of intersection: 
ellipse or hyperbola 
(or a degenerate conic) 


4 Conics and quadrics 


349 


Unit C4 _ Ejigenvectors 


MON SMH. 


Gaspard Monge 


Jean Nicolas Pierre Hachette 


350 


The first systematic classification of quadric surfaces was by Leonhard 
Euler (1707-1783) in his celebrated Introductio in analysin 
infinitorum (1748) — the textbook in which he laid down the 
foundations of analysis — where he treated surfaces of second degree as 
a family of quadrics in space analogous to the plane conic sections. 
The subject was developed in a more rigorous way by Gaspard Monge 
(1746-1818) and Jean Nicolas Pierre Hachette (1769-1834) who, in 
1802, provided an algebraic study of quadric surfaces, which was later 
published as a textbook. Both Monge and Hachette were professors at 
the famous Ecole Polytechnique in Paris. This college was founded at 
the end of the nineteenth century to provide students with a 
mathematical and scientific education, and to prepare them for entry 
to the prestigious Grandes Ecoles, higher education establishments for 
the training of civil and military engineers. 


As with conics, to identify a given quadric from its equation, we will align 
the axes and translate the origin to obtain an equation that resembles the 
equation of a quadric in standard position: we say that such an equation of 
a quadric is in standard form. So, for example, the equation 


r2 y? z2 


2 P'e 
is an equation of a hyperboloid of one sheet in standard form, although it 
is not in standard position. 


=l 


To write the equation of a quadric in standard form, we use the same 
techniques that we used for conics: introducing matrices, orthogonal 
diagonalisation and completing the square. We omit the justification — it is 
analogous to that for conics. 


We summarise this method in the following strategy. 


Strategy C25 
To write the quadric with equation 
Az? + By? + Cz? 4+ Fay+ Gyz + Hez+ Je+ Ky+Llz+M=0 
in standard form, do the following. 
1. Introduce matrices: 


e write down the matrices 


A Ie gë J 

A=|3F B C| and J=|K 
1 il L 
5H 3G C 


2. Align the axes: 
e orthogonally diagonalise A to get 


MO 0 
PrAP=[0 X 0 
On 0 X% 


e find (f g h) =J"P, and write the quadric in the form 
MET Ey fea E ea Ee) fen M =, 
3. Translate the origin: 


e complete the squares 


e make a substitution to change to the coordinate system 
(ae ie ee 


The following worked exercise and exercises illustrate this strategy. 


Worked Exercise C75 


Use Strategy C25 to write the quadric with equation 
5a? + By? +32? — 2xry + 2yz — 2az — 10x + 6y — 2z — 9 = 0 


in standard form. Which of the six types of quadric does this represent? 


Solution 


®. As with conics, since some parts of this working can be quite long, 
we number the strategy steps in the solution. „® 


1. Introduce matrices. 


We have 
5 —1 -l —10 
A= |-1l 3 1 and J= 6 
—1 il 3} —2 


2. Align the axes. 
®. You orthogonally diagonalised A in Exercise C137(b). ©& 
We have 


6 0 0 
PTAP=10 3 0ļ, 
OO) 2 


where 


0 
1 
v2 
1 
V2 


S-S- Sim 
Sl- Sl- hl- 


4 Conics and quadrics 


351 


Unit C4 Eigenvectors 


®. Since det P = 1, this transition matrix represents a rotation of 
the basis vectors, but this fact does not concern us here. © 


So 


= (4v6 -2V3 4v2). 
The equation of the quadric is now 
Oe) EN EAN VO = D dy = 0 = 0) 
3. Translate the origin. 


We write this equation as 
4 2 
6 oP + a!) +3 ( PS) 
(e+ w- y 
+2 (œ? + 2v22') =9=0. 


Completing the squares in this equation, we obtain 


ae es 
e(z) -443(/- =) -1 
+A ty Ag =): 
Substituting 


1 
i af y! =y-—= and M= 


V3 
in this equation and simplifying, we obtain 
Cs EAN = is: 
The equation of the quadric in standard form is 


(ae is (y’)? (Ze = 
3 6 9 i 


This is the equation of an ellipsoid. 


Exercise C146 


Use Strategy C25 to write the quadric with equation 
a? +y? + 27 — 24+ 4y—6z-—11=0 
in standard form. Which of the six types of quadric does this represent? 


352 


Summary 


Exercise C147 


Use Strategy C25 to write the quadric with equation 
Ag? + 3y? + 227 + day + 4yz + 12x + 122 +18 = 0 
in standard form. Which of the six types of quadric does this represent? 


(At the start of Subsection 3.1 we found that 
_f(2 21 212) (1 22 
E = {(3, +3) »(—3,3> 3) (35-33) $ 
is an orthonormal eigenvector basis for the matrix A of this quadric with 
respect to the eigenvalues \ = 6, A = 3 and A = 0.) 


Summary 


In this unit you have met eigenvectors and eigenvalues: an eigenvector of a 
linear transformation t : V — V is a non-zero vector v that is mapped by t 
to a scalar multiple of itself, and this scalar is the corresponding 
eigenvalue A. Since such a linear transformation always has a square 
matrix representation, you have seen that eigenvectors and eigenvalues can 
equivalently be defined in terms of matrices: Av = Av. You have found 
eigenvalues and eigenvectors by solving the corresponding characteristic 
equation det(A — AI) = 0. You have seen that there may be no 
eigenvalues, for example when t is a rotation of R?, and that all the 
eigenvectors corresponding to a given eigenvalue A, plus the zero vector, 
form a subspace S(A) of V whose dimension is never greater than the 
multiplicity of the eigenvalue. 


You have investigated when t has an eigenvector basis FÆ; that is, a basis 
comprising only eigenvectors of t, and you have met transition matrices P 
that map a basis E of V to the standard basis. You have seen 

(Theorem C60) that the transition matrix P maps standard coordinates 
of V to E-coordinates of V and that P is invertible. You have learned 
(Theorem C62) that whenever an eigenvector basis can be found, the 
transition matrix P can be used to express the matrix A of t (with respect 
to the standard basis) as a diagonal matrix (with respect to this 
eigenvector basis) via the relation D = P~'AP. Furthermore, when t has 
a symmetric matrix representation, the eigenvectors corresponding to 
different eigenvalues are orthogonal (Theorem C64), and an eigenvector 
basis can always be found. In addition, the basis vectors can be chosen to 
give an orthonormal eigenvector basis so that the transition matrix is an 
orthogonal matrix satisfying PT = P~!, giving D = P7 AP. 


353 


Unit C4 Eigenvectors 


354 


Thus diagonalising matrices involves the main ideas you have studied 
throughout this book on linear algebra: vectors, matrices, vector spaces, 
bases and linear transformations. 


In the final section you have seen how these techniques can be used to 
identify the type of a conic, or quadric, from its equation. 


Learning outcomes 


After working through this unit, you should be able to: 


e explain the meaning of the terms eigenvalue, eigenvector, characteristic 
equation and eigenspace 


e recognise the geometric interpretation of eigenvectors and eigenspaces in 
special cases 


e find the eigenvalues and eigenvectors of a given 2 x 2 or 3 x 3 matrix 
e describe some basic properties of eigenvalues and eigenvectors 


e write down the matrix of a linear transformation t with respect to a 
given eigenvector basis of t 


e write down the transition matrix from an eigenvector basis to the 
standard basis 


e diagonalise a given square matrix, if possible 

e understand that any symmetric matrix can be orthogonally diagonalised 
e orthogonally diagonalise a given symmetric matrix 

e describe some basic properties of orthogonal matrices 


e write the equation of a given non-degenerate conic in standard form and 
hence classify it 


e understand the term quadric and recognise the six types of quadric 
covered 


e write the equation of a given quadric in standard form and hence 
classify it. 


Solutions to exercises 


Solution to Exercise C115 


We have 
t(2, -2) = (2 — 8,2 + 4) = (—6, 6) 
= —3(2, —2) 
and 
t(—7,7) = (—7 + 28, —7 — 14) = (21, —21) 


= —3(-7,7). 


In each case the original vector is scaled by the 
factor —3. 


Solution to Exercise C116 

(a) We have t(0,1) = (4, —2), t(1,2) = (9, —3) and 
t(4,1) = (8,2). 

(b) The linear transformation t maps the line 
joining the points (0,0) and (4,1) to the line 
joining the points (0,0) and (8,2). But 

(8,2) = 2(4, 1), so these lines are the same and 
both can be written as x = 4y. Therefore the line 
x = Ay is mapped to itself by the linear 
transformation t. 


(c) We have 
t(4k, k) = (4k + 4k, 4k — 2k) = (8k, 2k) 
= 2(4k, k), 


so any vector lying along the line x = 4y is scaled 
by the factor 2. 


Solution to Exercise C117 


(a) A reflection t in the line y = x maps the point 
(x,y) to the point (y,x). Each point on the line 
y = x is mapped to itself, since 


t(k, k) = (k, k) = 1(k, k), 
so the non-zero vectors (k, k) are eigenvectors with 
corresponding eigenvalue 1. 


Each point on the line y = —x is mapped to 
another point on the line y = —2, since 


t(k, —k) = (—k, k) = —1(k, —k), 


so the non-zero vectors (k, —k) are eigenvectors 
with corresponding eigenvalue —1. 


Solutions to exercises 


(b) A 2-dilation t maps the point (x,y) to the 
point (2x, 2y). Every line through the origin is 
mapped to itself; that is, every non-zero vector in 
the plane is an eigenvector of t. Let k and l be real 
numbers which are not both zero. Then 


t(k,l) = (2k, 21) = 2(k, 1), 


so the non-zero vectors (k,l) are eigenvectors with 
corresponding eigenvalue 2. 


(c) An anticlockwise rotation t through 7/2 maps 
the point (x,y) to the point (—y, x). No line 
through the origin is mapped to itself by t, so t has 
no eigenvectors. 

(d) An anticlockwise rotation t through 7 maps 
the point (x,y) to the point (—x,—y). Each line 
through the origin is mapped to itself; that is, each 
non-zero vector in the plane is an eigenvector of t. 


Let k and l be real numbers that are not both zero. 
Then 


t(k,l) = (—k, —l) = -1(k, 0), 


so the non-zero vectors (k,l) are eigenvectors with 
corresponding eigenvalue —1. 


Solution to Exercise C118 


(a) We wish to find those vectors (x,y) that are 
mapped to scalar multiples of themselves; that is, 
the vectors that satisfy 


(—5a + 3y, 6x — 2y) = (Az, Ay). 
Equating coordinates, we obtain the system 
—5a2 + 3y = Ax 
6x — 2y = Ay, 
which we write as 


(-—5 —A)x+ 3y =0 
6x + (-2—A)y = 0. 
(b) Non-zero solutions to the eigenvector 


equations exist if and only if the determinant of 
the coefficient matrix is 0; that is, if and only if 


-5-A 3 
p aA" 


355 


Unit C4 Eigenvectors 


We expand the determinant and obtain 


(-5— A)\(—2— à) = 18 = 0, 


which simplifies to 
AX +7 —-8=0. 


The eigenvalues of t are the solutions to this 
characteristic equation. We have 


A? +7A—8=(A—1)(å+8)=0, 
so the eigenvalues are À = 1 and A = —8. 


(c) To find the corresponding eigenvectors, we 
consider each value of À in turn. 


The eigenvector equations become 


—6x2 + 3y =0 
6x — 3y = 0. 
These equations are equivalent to the single 
equation 
2x —y=0. 


Thus the eigenvectors corresponding to À = 1 
are the non-zero vectors (x, y) for which 
y = 2x; that is, the vectors of the form 


(k,2k), where k 40. 
A= -—8) The eigenvector equations become 
3x + 3y = 0 
6x + 6y = 0. 
These equations are equivalent to the single 
equation 
zrty=0. 
Thus the eigenvectors corresponding to 
A = —8 are the non-zero vectors (x, y) for 
which y = —2; that is, the vectors of the form 
(k,—k), where k £0. 


Thus the eigenvectors of t are the non-zero vectors 
of the following forms: 


(k, 2k), corresponding to A = 1, 
(k, —k), corresponding to \ = —8. 


Solution to Exercise C119 


(a) The matrix of t with respect to the standard 
basis for R? is 


a-h; Ei 


356 


We use Strategy C18 to find the eigenvalues and 
eigenvectors of A, which are the same as those of t. 


First we find the eigenvalues of A. 
The characteristic equation of A is 
det(A — AT) = 0; that is, 


1—- xX 3 


a d= 


We expand the determinant and obtain 


(1—A)(—4— A) 


6 =0, 
which simplifies to 
M+3A—10 = (A— 2). +5) =0. 
The eigenvalues of A are therefore \ = 2 and 
à= 5. 
Next we find the eigenvectors of A. 


The eigenvector equations are 


(1— A)z + 3y =0 
2r + (—4 — à)y = 0. 


The eigenvector equations become 
—x + 3y =0 
2x = 0y = 0. 
These equations are equivalent to the single 
equation 


z— 3y=0. 


Thus the eigenvectors corresponding to A = 2 
are the non-zero vectors for which x = 3y; 
that is, the vectors of the form 


(3k,k), where k £0. 
A =-—5)| The eigenvector equations become 
6x + 3y = 0 
2x+ y=0. 
These equations are equivalent to the single 
equation 
2e+y=0. 
Thus the eigenvectors corresponding to 
A = —5 are the non-zero vectors for which 
y = —2z; that is, the vectors of the form 
(k,-2k), where k #0. 


Thus the eigenvectors of t are the non-zero vectors 
of the following forms: 

(3k,k), corresponding to À = 2, 

(k, —2k), corresponding to À = —5. 


(b) The matrix of t with respect to the standard 
basis for R? is 


a E: 


We use Strategy C18 to find the eigenvalues and 


eigenvectors of A, which are the same as those of t. 


First we find the eigenvalues of A. 
The characteristic equation of A is 
det(A — AT) = 0; that is, 


ea 9 


-2 an 


We expand the determinant and obtain 


-A-2 -ii 


which simplifies to 

A? +A- 6= (A-2) (+3) =0. 
The eigenvalues of A are therefore \ = 2 and 
A= —3. 
Next we find the eigenvectors of A. 
The eigenvector equations are 

(1—A)a — 2y = 0 

—2x + (—2 — Ajy =0. 


The eigenvector equations become 


—x% — 2y=0 
—2¢ —4y=0. 
These equations are equivalent to the single 
equation 
x+y =Q. 


Thus the eigenvectors corresponding to À = 2 
are the non-zero vectors for which x = —2y; 
that is, the vectors of the form 


(—2k, k), where k #0. 


A= -—3| The eigenvector equations become 
4z — 2y = 0 
—24+ y=0. 


Solutions to exercises 


These equations are equivalent to the single 
equation 


2x —y=0. 


Thus the eigenvectors corresponding to 
A = —3 are the non-zero vectors for which 
y = 2x; that is, the vectors of the form 


(k,2k), where k 40. 


Thus the eigenvectors of t are the non-zero vectors 
of the following forms: 


(—2k,k), corresponding to À = 2, 
(k, 2k), corresponding to À = —3. 


Solution to Exercise C120 


The matrix of t with respect to the standard basis 
for R? is 


4 2 0 
A= [2 3 2 
0 2 2 
We use Strategy C18 to find the eigenvalues and 
eigenvectors of A, which are the same as those of t. 


First we find the eigenvalues of A. 


The characteristic equation is det(A — AI) = 0; 
that is, 


4—x 2 0 
2 3-A 2 |=0. 
0 2 2—A 


We expand the determinant and obtain 


3— à 2 2 2 


(4—A) p) gA o JA 


|+0=0. 


Simplifying this expression, we obtain 


(4 = A)((3 = A) (2 = A) — 4) — 2(2(2 — A)) = 0, 


or 
dA? — 9d? + 18\ = 0. 


There is no constant term, so we take out the 
factor A, then factorise the remaining quadratic 
factor: 


AO? — 9A + 18) =A0=6)A=3)=0. 


The eigenvalues of A are therefore \ = 0, \ = 6 
and A= 3. 


357 


Unit C4 Eigenvectors 


(As a quick check 4 + 3 + 2 = 9 = 6 + 3 + 0, so the 
sum of the eigenvalues is indeed equal to the sum 
of the diagonal entries.) 


Next we find the eigenvectors of A. 


The eigenvector equations are 


(4—A)a + 2y = 
2x + (3—A)y 22 =) 
2y+(2-A)z= 
The eigenvector equations become 
—2u + 2y =0 
2x = 3y + 2z =0 
2y —4z=0. 


The first and third equations imply that 

x = y and y = 2z, so x = 2z. These satisfy 
the second equation. Thus the eigenvectors 
corresponding to the eigenvalue À = 6 are the 
non-zero vectors (x,y,z) satisfying y = 2z 
and x = 2z; that is, the vectors of the form 


(2k,2k,k), where k £0. 
The eigenvector equations become 


xu + 2y =0 
22 +2z=0 
2y— z=0. 


The first and second equations imply that 

xz = —2y and z = —x, so z = 2y. These 
satisfy the third equation. Thus the 
eigenvectors corresponding to the eigenvalue 
A = 3 are the non-zero vectors (x,y,z) 
satisfying x = —2y and z = 2y; that is, the 


vectors of the form 
(—2k,k,2k), where k £0. 


The eigenvector equations become 


4x + 2y =0 
2x2 + 3y + 2z =0 
2y + 2z=0. 


The first and third equations imply that 

y = —2x and z = —y, so z = 2x. These 
satisfy the second equation. Thus the 
eigenvectors corresponding to the eigenvalue 
A = 0 are the non-zero vectors (x,y,z) 


358 


satisfying y = —2x and z = 22; that is, the 
vectors of the form 


(k,—2k,2k), where k £0. 


Thus the eigenvectors of t are the non-zero vectors 
of the following forms: 
(2k, 2k, k), corresponding to A = 6, 
(—2k,k,2k), corresponding to A = 3, 
(k,—2k,2k), corresponding to A = 0. 


Solution to Exercise C121 
(a) Let 


a 


The characteristic equation is det(A — AI) = 0; 
that is, 


1—à 2 


0 6-\ 72 


We expand the determinant and obtain 
(1—A)(6— A) —0=0. 


The eigenvalues of A are therefore \ = 1 and 
A = 6. Notice that these are the diagonal entries of 
the upper triangular matrix A. 


(b) Let 
8 0 0 
A={0 -5 0 
0 O 21 


The characteristic equation is det(A — AI) = 0; 
that is, 


8—A 0 0 
0 =h=A 0 
0 0 21— à 


=0. 


We expand the determinant and obtain 


—5-— à 0 


0 OE 


(8— A) 


Simplifying this expression, we obtain 
(8 — A)((—5 — A)(21 — A) — 0) = 0. 


The eigenvalues of A are therefore \ = 8, A = —5 
and A = 21. Again, these are the diagonal entries 
of the diagonal matrix A. 


(c) Let 
4 0 0 
A=[25 -2 0 
17 m 6 


The characteristic equation is det(A — AI) = 0; 
that is, 


4—xX 0 0 
25 —2— À 0 
17 T 6—A 


= 0. 


We expand the determinant and obtain 


—2— À 0 


(4-A) A a= 0. 
Simplifying this expression, we obtain 
(4 — A)((—2 — A)(6 — A) — 0) = 0. 


The eigenvalues of A are therefore \ = 4, \ = —2 
and à = 6. Again, these are the diagonal entries of 
the lower triangular matrix A. 


Solution to Exercise C122 


The non-zero vectors of the form 
(2k, 2k,k) are the eigenvectors of t 
corresponding to A = 6. The eigenspace 5$(6) 
is therefore the set of vectors 


{(2k, 2k, k) : k € R}. 
Any vector in S(6) can be written as 
k(2,2,1), so {(2,2,1)} is a basis for S (6). 
Thus S(6) has dimension 1. 


The non-zero vectors of the form 
(—2k,k,2k) are the eigenvectors of t 
corresponding to À = 3. The eigenspace $(3) 
is therefore the set of vectors 


{(—2k,k, 2k) : k € R}. 


Any vector in $(3) can be written as 
k(—2,1,2), so {(—2,1,2)} is a basis for $(3). 


Thus S$(3) has dimension 1. 


Solution to Exercise C123 


The matrix 


1 1 
A=[{0 4 0 
0 0 


Solutions to exercises 


is triangular, so the eigenvalues are the diagonal 
entries À = 1, à = 4 and A= 4. 
The eigenvector equations are 

(1— A)r + y — 
(4—A)y 


The eigenvalue \ = 1 has multiplicity 1. 


The eigenvector equations become 


y- z=0 
3y =0 
3z= 0. 


The second and third equations give y = 0 
and z = 0, respectively, which satisfy the first 
equation. (They give no constraint on 2.) 


Thus the eigenvectors corresponding to the 
eigenvalue \ = 1 are the vectors of the form 
(k,0,0), where k 4 0. 


The eigenspace $(1) is the set of vectors 
{(k,0,0) :k E€ R}. 


Any vector in S(1) can be written as 
k(1,0,0), so 


{(1, 0, 0)} 
is a basis for S(1). 
Thus S(1) has dimension 1. 
(Geometrically, S(1) is the x-axis.) 
The eigenvalue \ = 4 has multiplicity 2. 


The eigenvector equations become 


The first equation gives z = y — 3x and the 
second and third give no constraints on y 
and z. 


Thus the eigenvectors corresponding to the 
eigenvalue À = 4 are the vectors of the form 
(k, l,l — 3k), where k and l are not both 0. 


The eigenspace $(4) is the set of vectors 
{(k, 1,1 — 3k) : k,l € R}. 


359 


Unit C4 Eigenvectors 


Any vector in $(4) can be written as 
k(1,0, —3) + 1(0,1,1), so 
{(1, 0, —3), (0, 1, 1)} 
is a basis for S(4). 
Thus S$(4) has dimension 2. 


(Geometrically, S(4) is the plane in R® 
—3z +y — z =Q.) 


An alternative solution comes from using the 


equivalent equation z = $(y — z), and has 


basis 
tela (=2,01) 
Solution to Exercise C124 


The matrix 
1 1 
a=(0 3) 
is triangular, so the eigenvalues are the diagonal 
entries à = 1 and à= 1. 
The eigenvector equations are 
(1—A)a+ y=0 
(1—A)y =0. 
The eigenvalue \ = 1 has multiplicity 2. 
The eigenvector equations become 
Or + y=0 
Oy = 0. 


Thus y = 0 and there are no constraints 

on x. Thus the eigenvectors corresponding to 
the eigenvalue À = 1 are the vectors of the 
form (k,0), where k Æ 0. 


The eigenspace $(1) is the set of vectors 
{(k,0):k € R}. 


Any vector in S(1) can be written as k(1,0), 
so 


{(1,0)} 
is a basis for S(1). 
Thus S(1) has dimension 1. 
(Geometrically, S(1) is the z-axis in R?.) 


360 


Solution to Exercise C125 
Let 


1 —1 0 
A= 1 4 1 
—1 1 4 
The characteristic equation is det(A — AI) = 0; 


that is, 


1-A -i 0 
1 4— x 1 
= 1 4—x 


= 0. 


We expand the determinant and obtain 


4-À 1 1 1 
P= a AENT d joo 
This simplifies to 
(1 — d)((4— A)? -— 1) + ((4-A) +1) =0. 


Using the relation x? — 1 = (x — 1)(x + 1), where 
x = 4 — å, this simplifies further to 


(1 — A)(3 — A)(5— A) + (5 — A) =O, 
and thus 
(5 —A)((1 — A)(B — A) + 1) = (5 — A)(A? — 4A + 4) 
= (5 — A)(A— 2}? 


The eigenvalues of A are À = 5, Aà = 2 and àA = 2. 


(As a quick check 1 + 4 + 4 = 9 = 5 + 2 + 2, so the 
sum of the eigenvalues is indeed equal to the sum 
of the diagonal entries.) 


The eigenvector equations are 
(1—A)a — y =0 
x+ (4— Ay + z=0 
TL + y + (4—A)z =0. 
The eigenvalue \ = 5 has multiplicity 1. 


The eigenvector equations become 


—4x — y =0 
zr-ytz=0 
—2£+y—z=0. 


The first equation gives y = —4x and 
substituting this into the second gives 

5a + z= 0, which implies that z = —5x. The 
third equation is equivalent to the second. 


Thus the eigenvectors corresponding to the 
eigenvalue A = 5 are the vectors of the form 
(k, —4k, —5k), where k Æ 0. 


The eigenspace S(5) is the set of vectors 
{(k, —4k, —5k): k E R}. 

Any vector in $(5) can be written as 

k(1, —4, —5), so 
{(1, 4, -8)} 

is a basis for $(5). 

Thus S$(5) has dimension 1. 

The eigenvalue \ = 2 has multiplicity 2. 
The eigenvector equations become 


—“Z- y =0 
£+2y+ z2=0 
-xr + yt2z=0. 


The first equation gives y = —x and 
substituting this into the second gives 

—a + z = 0, which implies that z = x. These 
satisfy the third equation. 


Thus the eigenvectors corresponding to the 
eigenvalue À = 2 are the vectors of the form 
(k, —k, k), where k 4 0. 


The eigenspace S(2) is the set of vectors 
{(k, —k,k):k €R}. 


Any vector in $(2) can be written as 
k(1,—1,1), so 


{(, = 1, 1)} 
is a basis for (2). 
Thus §(2) has dimension 1. 


Solution to Exercise C126 


Letting k = 1, we see that (—2,1) and (1,2) are 
eigenvectors of t. Since (1,2) is not a multiple of 


(—2,1), these two eigenvectors form a basis for R?. 


Solution to Exercise C127 


Each of the vectors in E is an eigenvector of t: 
t(0, L —1) _ (0,0, 0) a 0(0, 1, —1), 
t(—2,1,0) = (4, -2,0) = —2(—2,1,0), 
t(1,0,—1) = (—3, 0,3) = —3(1,0, —1). 


Solutions to exercises 


Thus Æ is a basis for R? consisting of eigenvectors 
of t; that is, EF is an eigenvector basis of t. 


Solution to Exercise C128 


(a) The matrix of t with respect to the standard 
basis for R? is 


1 -2 
—2 -—2)° 
(b) Following Strategy C19, first we find the 
images of the vectors in the basis 
E = {(—2,1), (1, 2)}: 
t(—2,1) = (—4,2), t(1,2) = (—3,—6). 
Next we find the E-coordinates of each of these 
image vectors: 
(—4, 2) = 2(—2, 1) + o(1, 2) 
= (2,0)z, 
(—3, —6) = O(—2, 1) a 3(1, 2) 
= (0,—3)p. 
Therefore ¢(—2, 1) = (2,0)¢ and t(1,2) = (0, —3)z. 
So the matrix of t with respect to the eigenvector 
basis E is 


2 0 
0 -—3/° 
Solution to Exercise C129 
In Exercise C127 you showed that 
t(0, i —1) = 0(0, 1, —1), 
t(—2, 1,0) = —2(—2, 1,0), 
t(1,0,—1) = —3(1,0, —1). 


So the eigenvalues of t are A, = 0, Ag = —2 and 
A3 = —3, and, by Theorem C59, the matrix of t 
with respect to E is 


0 0 0 
0—2 0 
0 0 —83 


Solution to Exercise C130 


or-( 


0 -2 1 
(b)P=/{ 1 1 0 
—1 0 —1 


361 


Unit C4 Eigenvectors 


Solution to Exercise C131 the following forms: 
Let t : R? —> R? be the linear transformation (2k, 2k,k), corresponding to À = 6, 
given by (—2k,k,2k), corresponding to A = 3, 


t(x,y) = (x — 2y, —2x — 2y) (k, —2k,2k), corresponding to A = 0. 
and let E be the eigenvector basis {(—2, 1), (1,2)} 
of t. It follows from Exercise C128 that A is the 
matrix of t with respect to the standard basis for 
R? and D is the matrix of t with respect to the 


It follows from Theorem C63 that we can form an 
eigenvector basis of A by taking one eigenvector 
corresponding to each of the three distinct 
eigenvalues. For example, 


eigenvector basis Æ. By Theorem C62, E = {(2,2, 1), (—2, 1, 2), (1, —2, 2)} 

D =P "!AP, where P is the transition matrix A . 

from E to the standard basis for R?; that is, is an eigenvector basis of A. 

9 1 We use the eigenvectors in EF to form the columns 
P= ( 1 | : of the transition matrix: 

2 —2 1 

Solution to Exercise C132 P=|2 1 -2 
1 2 2 

5 2 0 32 0 
(a) D’ = C as) = ( 0 a We use the eigenvalues corresponding to the 


(b) We have A5 = PD®P-!, where D i ; eigenvectors in FE to form the diagonal matrix: 
e have = , where D is as in 


part (a) and 6 0 0 
P'AP=D=10 3 0 
00 0 


Solution to Exercise C134 


Since P~! = —t ( _ 2 it follows that E ; 
=l a2 The characteristic equation of A is 
as— (72 2) (22 0\ (-2 + Tox. 0 0 
-a12 0 —243 1 2 0 2-A 1 |=0. 
a 0 1 2-À 
f -3i =i 
= (110 —188/` We expand the determinant and obtain 


2 
Solution to Exercise C133 oi ae ae 
(There are many solutions possible for this and for 
each of the remaining exercises in this section, each (1 — A)(A* — 4A + 8) = (1— A)(A— 1)(A- 3) 
corresponding to a different ordering of the = 0. 
eigenvalues or a different choice of eigenvectors; in 
each case the matrix P should correspond to the 
matrix D so that P~'AP = D.) 


We use Strategy C20. 
The eigenvalues of A are À = 6, à = 3 and A= 0. 


which simplifies to 


The eigenvalues of A are therefore \ = 3, A = 1 
and à= 1. 


To find the eigenspaces of A, we consider the 
eigenvector equations 


(1—A)a =0 
The eigenvectors of A are the non-zero vectors of (= A)y + y=) 


y+ (2—-A)z =0, 


for each of the eigenvalues. 


362 


The eigenvector equations become 


—22 =0 
-—y+z=0 
y—z=0. 


Sox =0,y=z. 
Thus $(3) = {(0,k,k):k € R}. 


The eigenvector equations become 


Ox =0 
y+z=0 
y+tz=0. 

So z = —y and there are no constraints on x. 


Thus $(1) = {(k,l,—-1) : k,l © R}. 
A basis for $(3) is {(0,1,1)} and a basis for S(1) is 
{(1,0,0), (0,1,—1)} because any vector in S(1) can 
be written as k(1,0,0) +/(0,1,—1). The set 
E = {(0, 1, 1); (L, 0, 0), (0, 1, —1)} 


contains three vectors, so it is an eigenvector basis 


of A. 


We use the eigenvectors in EF to form the columns 
of the transition matrix: 


0 1 0 
P=j;1 0 1 
1 0 =i 


We use the eigenvalues corresponding to the 
eigenvectors in E to form the diagonal matrix: 


3.0 0 
P'AP=D=]|0 1 0 
001 


Solution to Exercise C135 


(a) We have 
(2k, 2k, k) - (—21,1, 21) = —4kl + 2kl + 2kl 
= 0, 
(2k, 2k, k) - (m, —2m, 2m) = 2km — 4km + 2km 
= 05 
(—21, 1, 21) - (m, —2m, 2m) = —2lm — 2lm + 4lm 
= 0. 


Thus the given vectors form an orthogonal set. 
Since there are three of them, they form an 
orthogonal basis for R. 


Solutions to exercises 


(b) [va] = |(2k, 2k, k)| = W4k? + 4k? + k? 
= V9k? 
= 3k, 
|va| = |(=24,1, 21)| = V4? +? + 4? 
912 
= 3l, 
|v3| = |(m, —2m, 2m)| = Vm? + 4m? + 4m? 
= Vy Im? 
= 3m. 
Thus |v1| = |v2| = |v3| = 1 if 
et p _ 1 
Solution to Exercise C136 
We calculate PTP. 
2 2 1 2 2 1 
3 3 3 3 3 3 
Tp _— 2 1 2 2 1 2 
P P=|-5 335 í = =5 
i 22 2) hi @ 2 
3 3 3 3 3 3 
2 0 0 10 0 
={0°2 0oļ}=[|0 1 0) =1 
00 $ 001 
Solution to Exercise C137 


(a) We use Strategy C22. 
The characteristic equation of A is 


9=A 
=2 


=2 


S 


We expand the determinant and obtain 
(9-—A)(6—A)-4=0, 

which simplifies to 
A? — 15A + 50 = (àA — 10)(A — 5) = 0. 


The eigenvalues of A are therefore A = 10 and 
A= 5. 


Next we find orthonormal bases for the 
eigenspaces. 


The eigenvector equations are 


(9—A)a — 24 =0 
—2x2 + (6 — A)y = 0. 


363 


Unit C4 Eigenvectors 


A=10)| The eigenvector equations become 
=g —2y=0 
—2x¢ —4y=0. 
These equations are equivalent to the single 
equation 
x+2y=0, 


that is, x = —2y. Thus the eigenvectors 
corresponding to À = 10 are the non-zero 
vectors of the form (—2k, k). 


An eigenvector of magnitude 1 corresponding 
to A = 10 is 


(34) 


The eigenvector equations become 


4x — 2y = 0 
—27+ y=0. 
These equations are equivalent to the single 
equation 
22 —y=0, 


that is, y = 2x. Thus the eigenvectors 
corresponding to A = 5 are the non-zero 
vectors of the form (k, 2k). 
An eigenvector of magnitude 1 corresponding 
to \=5 is 
eva) 
v5 V5) ` 
It follows from Theorem C64 that an orthonormal 
eigenvector basis of A is 


EE 


We use the eigenvectors in EF to form the columns 
of the transition matrix: 
2 1 


v5 V5 
We use the eigenvalues corresponding to the 
eigenvectors in E to form the diagonal matrix: 


Tap n (10-0 
parens o i] 


(b) The eigenvalues of A are given as À = 6, \=3 
and A = 2. 


364 


Now we find an orthonormal eigenvector basis 
of A. 


The eigenvector equations are 


(5 — A)z — y— Zg=0 
—x + (3 — A)y + z=0 
=£ + y+(8-A)z= 
A =6) The eigenvector equations become 


—“£- y= z=0 

—xz—3y+ z2=0 

=F y-3z=0. 
Adding the first and second equations 
together, we obtain 

—2¢ — 4y = 0, 
so z = —2y. Substituting this into the third 
equation, we obtain 

3y — 3z = 0, 
so z = y. Thus the eigenvectors corresponding 
to A = 6 are the non-zero vectors of the form 
(—2k, k, k). 
An eigenvector of magnitude 1 corresponding 
to A = 6 is 


EE] 


The eigenvector equations become 


2r- y-z=0 
= +2z=0 
=74 Fy = 0. 


The second and third equations imply that 

z = x and y = x. These satisfy the first 
equation. Thus the eigenvectors 
corresponding to À = 3 are the non-zero 
vectors of the form (k, k, k). 

An eigenvector of magnitude 1 corresponding 
to A\=3 is 


Gss) 


The eigenvector equations become 


3r- y-z=0 
-r +y+z=0 
-trt +y+z=0. 


Adding the first and second equations 
together, we obtain 


2x = 0, 


which implies that « = 0. Substituting this 
into the third equation, we obtain 
y +z=0, 


which implies that z = —y. Thus the 
eigenvectors corresponding to À = 2 are the 
non-zero vectors of the form (0, k, —k). 


An eigenvector of magnitude 1 corresponding 


to A = 2 is 
i-a) 


It follows from Theorem C64 that an orthonormal 
eigenvector basis of A is 


BEET EEE] 
Ck) 


We use the eigenvectors in EF to form the columns 
of the transition matrix: 


2 is 4% 
v6 v3 

fs) Se, (al 

| v6 v3 V2 
E A e 
T A A 


We use the eigenvalues corresponding to the 
eigenvectors in E to form the diagonal matrix: 


6 0 0 
0 3 0 
0 0 2 


PAP =D = 


Solution to Exercise C138 
We use Strategy C22. 


A basis for the eigenspace S(3) is {(0,1,1)}, so an 
orthonormal basis for $(3) is 


co) 


A basis for the eigenspace S(1) is 
{(1,0,0), (0,1, —1)}. 


These two basis vectors are orthogonal since 


(1,0,0)- (0,1,-1) =0. 


Solutions to exercises 


An orthonormal basis for $(1) is therefore 


(con (ea) 


By Theorem C64 an orthonormal eigenvector basis 
of A is therefore 


OEN 


We use the eigenvectors in & to form the columns 
of the transition matrix: 


0 1 0 
1 1 
P=|8° WF 
EO ot 
a 8 


We use the eigenvalues corresponding to the 
eigenvectors in E to form the diagonal matrix: 


3 0 0 
0 1 0 
00 1 


PTAP=D= 


Solution to Exercise C139 


By Theorem C65, to prove that the product PQ is 
orthogonal it is sufficient to show that 


(PQ)* = (PQ). 
But 

(PQ)? =QTPT = QP = (PQ). 
Solution to Exercise C140 


(a) To verify that A is orthogonal, it is sufficient 
to show that ATA =I, by Theorem C65. 


0 0 1\ 70 0 =I 
ATA=| 01o0oļllo0o1ı =O 
-1 0 0/ \1 0 0 


so A is orthogonal. 


(Alternatively, we could have shown that the 
vectors (0,0,1), (0,1,0) and (—1, 0,0) form an 
orthonormal basis for R.) 


365 


Unit C4 Eigenvectors 


(b) We evaluate the determinant of A: We use the eigenvectors in FE to form the columns 
00 1 of the transition matrix: 
01 o =0-0- [9 Jer pele ene 
10 0 p-| vd v5 
E 1 2 
a : antati 3 eee 
Therefore A represents a rotation of R°. VB V5 
Solution to Exercise C141 We use the eigenvalues to form the diagonal matrix 
(a) The ellipse with equation PTAP=D= ¢ 
a ag 0 -3 
ae T mi 1 It follows from equation (9) that the equation of 
is written in matrix form as the conic is now 
2 0\/x' 
1/a? 0 oe 
ca nr CG IC) 
2 1 
So the ellipse in standard position has -— —= 
p p mR ee 7 
1/a2 0 0 +(6 B YY ,) +21=0, 
S juz) and = (6): eee Y 
/ v5 V5 
(b) The hyperbola with equation that is, 
a 2(a')? — 3(y')? + 6V5y/ + 21 =0. 
a Vo 
is written in matrix form as Solution to Exercise C143 
2 
gT C r)” ( 0)x-1=0. We have 
2(a’)? — 3(y')? + 6V5y' + 21 = 0, 
So the hyperbola in standard position has 
which is equivalent to 
A= 1/a? 3 and J = 7 
S(O -1/0? =i] 2(x')? — 3 (w? = avy’) +21 =0. 
(c) The parabola with equation Completing the square gives 
2 
y? = 4ax 2(x')? — 3(y' — V5)? + 15 + 21 = 0, 
is written in matrix form as aa 
0 0 
xT G 1 x+(—4a 0)x+0=0. 2(x)? — 3(y' — V5)? + 36 = 0. 
So the parabola in standard position has We set the new coordinates to be 
APER i W\ f ke 
re i and = (70°) = (7 = Gy v5), 
so substitute x” = x’ and y” = y' — V5. 
Solution to Exercise C142 The equation of the conic is now 
By Theorem C64 an orthonormal eigenvector basis 2(a’")? = 3(y”)? = — 96, 
of A for the eigenvalues À = 2 and A = —8, in that 
order, is on 
1152 12 
TE -EL ul =n 


366 


Solutions to exercises 


Solution to Exercise C144 


1. Introduce matrices. 


Solution to Exercise C145 


1. Introduce matrices. 


We have 


2. Align the axes. 


We have 
T _ (10 0 
prap=(10 9), 
where 
2 1 
_| vb Vv 
1 2 
Vi 
So 


(f g)=(-10 —20) 


50 
=(0 a 
=(0 -10/5). 
The equation of the conic is now 
10(x’)? + 5(y')? — 10V5y’ — 5 = 0. 
Dividing through by 5, we obtain 
2(2')? + (y')? — 2V5y' -1 =0. 


3. Translate the origin. 


al al 
al’ al 


We write this equation as 
2(x')? + (w? = 2v5y') =1 20 


Completing the square in this equation, we 
obtain 


Da! Ph + (y — V5} =H 1 = 0, 
Substituting z” = a’ and y” = y' — V5 in this 
equation and simplifying, we obtain 

2(x")? dhe (y")? —6=0. 

The equation of the conic in standard form is 
(a)? (y)? 
The conic is an ellipse. 


=]; 


We have 


asà a) ma a= (25) 


2. Align the axes. 


The characteristic equation of A is 


1-r -2 


D dA 


We expand the determinant and obtain 
(1—A)(4—A)-—4=0, 
which simplifies to 
A? — 5A =AA—5) =0. 
The eigenvalues of A are 5 and 0. 
The eigenvector equations are 
(1 —A)z — 2y = 0 
—22 + (4—A)y =0. 
A=5]| The eigenvector equations become 
—dr — 2y =0 
—2x—- y=0. 
These equations are equivalent to the 
single equation 
2x +y =0, 
which implies that y = —2a. Thus the 


eigenvectors corresponding to A = 5 are the 
non-zero vectors of the form (k, —2k). 


An orthonormal basis for $(5) is 


1 2 
(5) 
The eigenvector equations become 
xz — 2y=0 

—2z + 4y = 0. 
These equations are equivalent to the 
single equation 

zx — 2y = 0, 
which implies that x = 2y. Thus the 


eigenvectors corresponding to A = 0 are the 
non-zero vectors of the form (2k, k). 


367 


Unit C4 Eigenvectors 


An orthonormal basis for $(0) is 


(eva)} 

VB V5/ SJ. 
By Theorem C64 an orthonormal eigenvector 
basis of A is therefore 


(5-3). Ga) 


We use the eigenvectors in EF to form the 
columns of the transition matrix: 


SO 


| 

= 

Q 

— 

I 

—— 

| 

Dp 

| 

(ee) 

~" 
alel- 
Ot ot 
Sle Sly 
Ot ol 


10 20 
7 (7 -3) 
= (2v5 —4v5). 
The equation of the conic is now 
B(x)? +2V5x' — 4V5y' +5 = 0. 
3. Translate the origin. 


We rewrite this equation by taking out the 
coefficient of the (a’)? term to get 


2 
5 ( (a! 4 0) — 4V/5y' +5 =0. 
(epi y 
Completing the square in this equation, we 
obtain 


1\2 
5 r+) —1—4v5y +5=0. 
( V5 g 
We substitute 
il 
gl! = x! + Eee 
V5 

into this equation and rewrite it by taking out 
the coefficient of the y’ term to get 


5(a")? — 4v5 (v ~ =) =0. 


368 


We substitute 
1 
R ae 
yY yY WG 
to obtain 
5(x")? = AV5y” = (0. 


The equation of the conic in standard form is 
Grassy 
The conic is a parabola. 


Solution to Exercise C146 


1. Introduce matrices. 


We have 
1 0 0 —2 
A= {01 0], J= 4 
0 0 1 —6 


2. Align the axes. 


The matrix is already in diagonal form. (The 
axes of the quadric are parallel to the z-axis, 
y-axis and z-axis of R.) 


3. Translate the origin. 


We write the equation as 
(x? — 2x) + (y? + 4y) + (2? — 6z) — 11 =0. 


Completing the squares in this equation, we 
obtain 


Ged? lee 2) 


11=0. 
Substituting 
g=e—-1, y =y+2 and 2 =z-3 
in this equation and simplifying, we obtain 
(o')? + y')? + (2? 
The equation of the quadric in standard form is 


CP WP, 
25 i 25 z 25 


This is the equation of an ellipsoid. 


25 = 0. 


=l; 


(This ellipsoid is in fact a sphere since 
a = b = c = 5; all the curves of intersection are 
circles.) 


Solution to Exercise C147 


1. Introduce matrices. We have 


4 2 0 12 
A=(2 3 2], J=| 0 
0 2 2 12 
2. Align the axes. 
We have 
6 0 0 
P7AP=[|0 3 O], 
0 0 0 
where 
2 _2 1 
3 3 3 
2 1 2 
P=|3 3 -5 
1 2 2 
3 3 3 
So 


WIN w= wN 


(f g k)=(12 0 12) 


wie WIN wwo 
WIN WILY w= 


=(12 0 12). 
The equation of the quadric is now 
6(x’)? + 3(y’)? + 122’ + 122' +18 = 0. 
3. Translate the origin. 
We write this equation as 
6 ((2')? + 2x") + 3(y’)? + 122’ + 18 =0. 


Completing the square in this equation, we 
obtain 


6(a’ +1)? — 6 +3(y')? + 127 +18 =0. 
Substituting 
r" =x'+1, y’=y and 2 =27/4+1 
in this equation and simplifying, we obtain 
Wa" P+ (y? +42” =0. 
The equation of the quadric in standard form is 
cal + cae = =z". 


This is the equation of an elliptic paraboloid. 


Solutions to exercises 


369 


Acknowledgements 


Acknowledgements 


Grateful acknowledgement is made to the following sources. 


Cover image: © Mark Owen 


Unit C1 


Colin Maclaurin (Subsection 5.1): NYPL / Science Source / Photo 
Researchers / Universal Images Group 


Unit C2 


Erhard Schmidt (Subsection 5.3): Taken from: 
http://www-history.mcs.st-andrews.ac.uk/PictDisplay /Schmidt.html 


Unit C4 


David Hilbert (Subsection 1.1): ‘[Photograph]: David Hilbert.’ American 
Journal of Mathematics, vol. 29, no. 1, 1907. JSTOR, JSTOR, 
www.jstor.org/stable/2369910 


Werner Heisenberg (Subsection 1.1): © Historical / Contributor / Getty 


Sergey Brin and Larry Page (Subsection 1.2): © James Leynse 
/Contributor / Getty 


Gaspard Monge (Subsection 4.2): ‘Gaspard Monge, Comte de Peluse. 
Lithograph.’ Credit: Wellcome Collection. CC BY 


Jean Nicolas Pierre Hachette (Subsection 4.2): ROYAL INSTITUTION 
OF GREAT BRITAIN / SCIENCE PHOTO LIBRARY 


Every effort has been made to contact copyright holders. If any have been 
inadvertently overlooked the publishers will be pleased to make the 
necessary arrangements at the first opportunity. 


370 


Index 


abelian group 45, 110, 114, 115 
addition in Mm,n 44 
addition, vector see vector addition 
additive identity 115 

in Mmn 44 

in R* 109 
additive inverse 115 

in Mm,n 44, 45 

in R* 109 
align the axes 338 
arithmetic 

in Mmn 44 

in R* 109 
associativity 44, 109, 111, 115 
augmented matrix 23, 141 
axiom 110, 115 


basis 144, 165, 205, 218, 250, see also 
non-standard basis 
strategy 146, 157 
strategy for Imt 251 
Basis Theorem 156 
Brin, Sergey 292 


Cauchy, Augustin-Louis 73 
characteristic equation 288, 291 
circle 334 
Clasen, Bernard Isidore 13 
closure 44, 108-112, 115, 160 
coefficient 9 
coefficient matrix 55, 288 
cofactor 79 
collinear 139 
column vector 41 
common factor 296 
commutativity 44, 109, 115 
completing the square 341 
complex number 119, 150, 158 
composition 232 
Composition Rule 234 
conic 334 
equation in R? 336 
matrix form 337 
standard form of equation 336 
standard position 334 
strategy 345 


consistent system 10 
constant term 9 
coplanar 139 

Cramer’s rule 74 
Cramer, Gabriel 74 
curves of intersection 347 


degenerate quadric 348 
determinant 74, 77, 80, 288 
2x2 matrix 74 
3x 3 matrix 77 
n xn matrix 80 
n x n strategy 80 
properties 82, 83, 87 
diagonal entry 291, 298 
diagonal matrix 50, 307, 308 
theorem 298 
diagonalisable 314 
diagonalisation 314, 321 
strategy 316 
dilation 199 
matrix representation 202 
dim V 156 
dimension 154, 156, 165, 243, 247 
C 158 
Mm,n 158 
Pa 158 
Dimension Theorem 259 
distributivity 112, 116 
in Mn,n 56 
matrix multiplication 50 
scalar multiplication 46 


E-coordinate 151-153, 305 
E-coordinate representation 217, 309 
eigenspace 299, 300, 304, 328 
eigenvalue 285, 291 
strategy 293 
theorem 326 
eigenvector 285, 291 
equations 288-289 
strategy 293 
theorem 326 
eigenvector basis 305, 308, 315 
strategy 319 
theorem 307, 317, 321 


Index 


371 


Index 


372 


elementary matrix 67, 71 
theorem 69, 83 


elementary operations (G-J elimination) 15, 


25, 82-85 
elimination (Gauss-Jordan) 141, 255 
ellipse 334 
ellipsoid 348 
elliptic cone 348 
elliptic paraboloid 348 
equal matrices 42 
equate corresponding coefficients 127 
Euclidean space 106 
Euler, Leonhard 350 
extended orthogonal set 178 


factorising 296 
finite dimension 154 


Gauss, Carl Friedrich 13, 73 
Gauss-Jordan elimination 141, 255 
elementary operations 15, 25, 82-85 
linear equations 12-15, 25-26 
matrices 25 
strategy 39 
Gram, Jørgen Pedersen 176 
Gram-Schmidt orthogonalisation 178, 329 
group 45, 60, 110 


(Minny +) 45 
(R*,+) 110 
(V,+) 115 


axioms 45, 110 
invertible n x n matrices 60 


Hachette, Jean Nicolas Pierre 350 
Heisenberg, Werner 286 
Helmholtz, Hermann von 286 
Hilbert, David 286 

homogeneous system 11, 288 
hyperbola 334 

hyperbolic paraboloid 348 
hyperboloid of one sheet 348 
hyperboloid of two sheets 348 


identity 109, 111, 115 
scalar multiplicative 111 
identity matrix 290, 295 
additive 44 
multiplicative 51 
identity transformation 213 


image 205, 216, 283, 305 
of a linear transformation 248 
image basis 251 
image set 248-249, 258 
image set of a linear transformation 248 
inconsistent system 10 
infinite dimension 107, 154 
infinite-dimensional vector space 212 
inverse matrix 62, 76 
2 x 2 strategy 76 
additive 44 
multiplicative 57 
inverse of a linear transformation 239 
inverse pair 70 
Inverse Rule 240 
inverse vector 110 
invertibility 
linear transformation 238, 239 
square matrix 58, 85 
strategy, linear transformation 243 
strategy, square matrix 62 
system of linear equations 64—66 
Invertibility Theorem 61 
proof 72-73 
isomorphic 246-247 
isomorphism 246-247 


Jordan, Wilhelm 13 
kernel 253-254, 258 


Laplace, Pierre-Simon 176 
leading diagonal see main diagonal 
leading unknown 31 
Leibniz, Gottfried Wilhelm 73 
line in R?, equation 5 
linear combination 122, 136, 214 
in M23 123 
in P 123 
strategy 126 
linear dependence 137 
linear equations 263 
n unknowns 9 
three unknowns 6 
two unknowns 5 
linear independence 137 
strategy 140 
subset 142 


linear polynomial P> 118 
linear transformation 205, 230-231, 283-287, 
333 
image set of 248 
invertibility 238, 239 
kernel of 254 
of a combination of vectors 215 
strategy 206 
strategy for matrix 305 
theorem 214, 228 
lower triangular matrix 51 


Maclaurin, Colin 74 
magnitude of polynomial 182 
magnitude of vector 180 
main diagonal 50 
matrix 22, 119 

Mo. 150, 158 

M23 119, 123 

Mm,n 119, 158 

arithmetic 42 

column of 22 

diagonal 50 

dimension 158 

entry of 22, 41 

invertible 58 

leading entry of 22 

leading 1 22 

multiplication 47, 234, 238 

operations 42, 47, 52 

product 47 

properties 87 

row of 22 

row-reduced 28-30 

transpose 52 

triangular 51 

zero row of 22 
matrix form of system of linear equations 55 
matrix representation 202-203, 217, 218, 

290-291 

depends on basis 227 

dilation 202 

reflection 203 

rotation 203 

scaling 202 

strategy 219 

unique 228 


Index 


minimal spanning set 136 

Monge, Gaspard 350 

multiple root 302 

multiplication in Mnn 56 

multiplication, scalar see scalar multiplication 
multiplicative identity 56 

multiplicative inverse 57 

multiplicity 303, 304 

multiplying matrices 47, 234, 238 


n-dimensional space 106 
negative of a matrix 42 
non-degenerate conic 334 
non-degenerate quadric 348 
non-homogeneous system 11 
non-leading unknown 31 
non-standard basis 217 
non-trivial solution 11 
normal vector 176 

n-tuple 106 


orthogonal basis 171, 173 
in R? 168 
in R” 173 
strategy for R? 172 
strategy for R” 175 
orthogonal diagonalisation 325, 338 
strategy 325 
orthogonal eigenvector 
theorem 326 
orthogonal matrix 324, 330-333 
theorem 330 
orthogonal polynomial 182 
orthogonal set 169, 173, 174 
orthogonal vectors 169, 173 
orthogonalisation (Gram-Schmidt) 178, 329 
orthonormal basis 180 
strategy 180 
orthonormal eigenvector basis 322-326, 338 
strategy 328 


Page, Larry 292 

parabola 334 

plane in RÌ, equation 6, 20, 21 
P, 119, 150, 158 


373 


Index 


374 


polynomial 119, 211 
dimension 158 
linear Pp 118 
magnitude 182 
quadratic P 112 
scalar product 181 

powers of a matrix 50 

product of matrices 47 

proper subset 159 


quadratic polynomial Ps 112 
quadric 347 
degenerate 348 
equation 350 
non-degenerate 348 


standard form of equation 350 


standard position 347 
strategy 350 
the six types considered 349 


R” 106, 150, 158 


R® 107 
R? > R? 205, 231 
R” > R” 931 


real vector space 115 
reflection 202, 216, 333 
matrix representation 203 
rotation 201, 211, 216, 333, 339 
matrix representation 203 
row vector 41 
row-reduced matrix 28-30, 255 
strategy 34 
uniqueness 39 
row-sum check 27 


scalar multiplication 42, 204 
inC 119 


in Mmm 119 

in P> 118 

in R? and R? 106 
in R4 111 

in R” 107 

in R® 120 


subspace 160 


scalar product 169, 172, 181 
scaling 201, 285, 332 

matrix representation 202 
scaling factor 245 
Schmidt, Erhard 176 
shear 208, 304 
simultaneous linear equations 5 
size (matrix) 22 
skew 208 
solution of system of equations 


solution set of system of linear equations 


Solution Set Theorem 257 
spanning set 129, 136, 250 


for Pz; 132 
minimal 136 
subset 164 


sphere 347, 348 


square matrix 41, 54, 56, 293, 303, 330 


invertibility 85 
theorem 83, 86 
standard basis 150, 305, 309, 323 


P> 218, 237 

P 237 

R? 216, 219, 226 
R? 227, 250 


standard coordinate 151 


standard coordinate representation 309 


standard form of a conic 336 
strategy 345 
standard form of a quadric 350 
strategy 350 
standard position 334, 347 
stretching see scaling 
submatrix 78 
subset 
span 164 
subspace 160, 168, 299 
strategy 160 
theorem 299 
Summary Theorem (invertibility) 
surface in R? see quadric 
symmetric matrix 54, 322, 338 
strategy 328 
theorem 326 


system of linear equations 5, 126, 140, 263, 288 
elementary operations 15 
infinitely many solutions 31 
invertibility 64-66 
matrix form 55 
n unknowns 9 
number of solutions 5-12, 264—266 
solution set 10 
three unknowns 6 
two unknowns 5 


trace 298 
transition matrix 310, 313 
theorem 310 
translate the origin 341 
translation 208 
transposition of matrices (properties) 53 
triangular matrix 51 
theorem 298 


Index 


trivial solution 11 


unknowns 9 
upper triangular matrix 51 


vector addition 204 

inC 119 

in Mmm 119 

in Pp 118 

in R? and R? 106 

in R” 107 

in R® 120 

subspace 160 
vector combination 214 
vector space definition 115 


zero matrix 42, 45 

zero polynomial 256 

zero transformation 213 

zero vector 110, 138, 160, 205 


375 


